Topics
Contents Information Sciences, 182 Article(s)
Spectral weighted sparse unmixing of hyperspectral images based on framelet transform
Chenguang XU, Hongyu XU, Chunyan YU, and Chengzhi DENG

Hyperspectral sparse unmixing methods have attracted considerable attention, and most current sparse unmixing methods are implemented in the spatial domain; however, the hyperspectral data used by these methods complicate feature extraction owing to scattered information, redundancy, and noisy spatial signals. To improve the robustness and sparsity of the unmixing results of hyperspectral images, a spectral-weighted sparse unmixing method of hyperspectral images based on the framelet transform (SFSU) is proposed. First, we introduce the theoretical knowledge of hyperspectral sparse unmixing and the framelet transform. Following this, we develop a hyperspectral image unmixing model based on the framelet transform using this theory. In this model, a spectral-weighted sparse regularization term is added to construct the SFSU. Finally, to solve the SFSU model, an alternating direction method of multipliers is presented. According to the experimental results, the signal-to-reconstruction error ratio is found to increase by 12.4%-1 045%, and the probability of success (Ps) remains within 16% error. The proposed model demonstrates better anti-noise and sparse performance compared with other related sparse unmixing methods and yields better unmixing results.

Optics and Precision Engineering
May. 10, 2023, Vol. 31 Issue 9 1404 (2023)
High-resolution optical satellite panchromatic and multispectral image geometric positioning consistency correction based on high frequency correction attitude
Yang BAI, Hongyu WU, Lingli WANG, Qianqian BA, ying PENG, Xing ZHONG, Zhongfu YE, and Guanzhou CHEN

There are small geometric positioning errors between panchromatic data and multispectral data when optical satellites, which have high attitude measurement accuracy, high attitude stability, and detectors using mechanical staggered stitching, are affected by slight high-frequency attitude errors. In this paper, a high-resolution optical satellite panchromatic and multispectral image geometric positioning consistency correction method based on the high-frequency corrected attitude is proposed to address this problem, and the proposed method is validated by in-orbit satellite data. First, a rigorous geometric model is realized according to the principle of the push-broom satellite. Second, the time-sharing imaging characteristics between mechanical staggered stitching detectors are used to obtain the homonymous image point data by combining geometric positioning constraints and the matching method of the pyramid image search strategy, and the homonymous image point data are used to obtain the high-frequency attitude data in the satellite imaging process. Finally, the obtained high-frequency attitude data are used in the sensor geometry correction processing of multispectral images to obtain multispectral image data corrected by the high-frequency corrected attitude. The results indicate that the proposed method effectively eliminates the small geometric positioning error between panchromatic and multispectral data caused by the slight high-frequency attitude error, so that the geometrically corrected multispectral and panchromatic data have high-precision geometric positioning consistency. The proposed method can improve the relative geometric positioning error in the row direction between the panchromatic and the multispectral data to better than 0.15 pixel of the multispectral image, which lays a solid foundation to produce high-precision image fusion products for high-resolution optical satellites with high attitude measurement accuracy, high attitude stability, and mechanical staggered stitching detectors.

Optics and Precision Engineering
May. 10, 2023, Vol. 31 Issue 9 1390 (2023)
Application of SENet generative adversarial network in image semantics description
Zhongmin LIU, Heng CHEN, and Wenjin HU

An SENet-based method for image semantics description of generative adversarial networks is proposed to address the inaccurate description of utterances and inadequate involvement of emotional colors in image semantics descriptions. The method first adds a channel attention mechanism to the feature extraction stage of the generator model so that the network can completely extract features from salient regions of the image and input the extracted image features into the encoder. Second, a sentiment corpus is added to the original text corpus, and a word vector is generated through natural language processing. This word vector is then combined with the encoded image features and input to the decoder, and a sentiment description statement is generated to match the content depicted in the image through continuous adversarial training. The proposed method is compared with existing methods through simulation experiments, and it is found to improve the BLEU metric by approximately 15% compared with the SentiCap method; improvements in other related metrics are also noted. In self-comparison experiments, the method exhibits an improvement of approximately 3% in the CIDEr metric. Thus, the proposed network can better extract image features, resulting in more accurate statements describing images and richer emotional colors.

Optics and Precision Engineering
May. 10, 2023, Vol. 31 Issue 9 1379 (2023)
Road traffic sign recognition algorithm based on improved YOLOv4
Daxiang LI, Zhongheng SU, and Ying LIU

To address the low recognition accuracy resulting from multiple scale changes in the traffic signs of complex scenes, an improved YOLOv4 algorithm is proposed. First, an attention-driven scale-aware feature extraction module is designed, and the range of receptive fields in each layer is widened to obtain more fine-grained multi-scale features by constructing a hierarchical connection mode similar to the residual structure; this is followed by the generation of a pair of attention maps with directional-aware and position-sensitive characteristics under the attention drive so that the network can focus on key areas with more discrimination. Following this, a feature-aligned pyramid convolution feature fusion module is constructed, and the feature offset between adjacent scale feature maps is obtained via convolution for feature alignment. Finally, the network adaptively learns the optimal feature fusion mode through pyramid convolution and constructs a feature pyramid to identify traffic signs with different scales. Experimental results indicate that the recognition accuracy for the TT100K dataset is improved by 5.4% compared with that of the original YOLOv4 algorithm, which is superior to other recognition algorithms, and the FPS reaches 33.17. Thus, the proposed algorithm satisfies the requirements of accuracy and real-time performance for road traffic sign recognition.

Optics and Precision Engineering
May. 10, 2023, Vol. 31 Issue 9 1366 (2023)
Multi-lane line detection and tracking network based on spatial semantics segmentation
Jinpeng SHI, and Xu ZHANG

Target detection networks based on deep learning has some problems in the field of lane line recognition, such as unclear lane differences, low recognition accuracy, a high false detection rate, and a high missed detection rate. To solve the aforementioned problems, a lightweight lane detection and tracking network, SCNNLane, based on spatial instance segmentation, was proposed. In the coding part, the VGG16 network and the spatial convolution neural network (SCNN) were applied to improve the ability of the network structure to learn spatial relationships, which solved the problems of blurring and discontinuity in lane prediction. Simultaneously, based on LaneNet, two branch tasks after encoding the output were coupled to improve poor foreground and background recognition and indistinguishability between lanes. Finally, the method was compared with five other semantic segmentation-based lane-line algorithms by using the TuSimple dataset. Experimental results show that the accuracy of this algorithm is 97.12%, and the false detection rate and missed detection rate are reduced by 44.87% and 12.7% respectivel, as compared with LaneNet, thus meeting the demand of real-time lane line detection.

Optics and Precision Engineering
May. 10, 2023, Vol. 31 Issue 9 1357 (2023)
Automatic extraction of red raspberry planting areas using time series multispectral images
Zhipeng WANG, and Xiaofei WANG

Raspberry has the reputation of being "the third generation of gold fruits." Obtaining accurate data on the planting area of raspberries is of great significance for adjusting the crop planting structure and industrial development in Shangzhi, the red raspberry country. Taking Zhoujiayingzi village, Weihe town, Shangzhi city, Heilongjiang province as the study area, a high spatial and temporal resolution of Sentinel-2 data was used to obtain time series data of the study area. Using time-series changes in terms of spectral characteristics and normalized vegetation index, the CART algorithm was used to estimate the raspberry planting area in the study area. A comparison with the results of planting areas obtained based only on multi-temporal remote sensing images was performed to explore any differences due to the participation of NDVI time-series data on the area extraction accuracy, and to compare the object-oriented classification and the support vector machine classification based on optimal time-phase data. The experimental results show that the two methods based on the time series CART algorithm obtain better results than the other two classification algorithms in extracting the planting area of raspberries and that they can obtain the planting area and spatial distribution of crops with a higher accuracy, which meets the needs of crop monitoring. NDVI time series data were then added to the multi-temporal data classification so that the spectral difference between crops could be enlarged, and the classification accuracy improved. Compared with only using Sentinel-2 multi-temporal data, the classification accuracy is improved by 1.67% and the Kappa coefficient is improved by 0.02.

Optics and Precision Engineering
Apr. 10, 2023, Vol. 31 Issue 7 1096 (2023)
Image fusion of dual-discriminator generative adversarial network and latent low-rank representation
Daiyu YUAN, Lihua YUAN, Tengyan XI, and Zhe LI

To improve the visual effect of infrared and visible image fusion, images from two different sources were decomposed into low-rank images and sparse images with noise removed by latent low-rank representation. Moreover, to obtain the fusion sparse plot, the KL transformation was used to determine the weights and weighted fusion of the sparse components. The generation adversarial network of the double discriminator was redesigned, and the low-rank component characteristics of the two sources were extracted as the inputs of the network through the VGG16 network. The fusion low-rank diagram was generated using the game of generator and discriminator. Finally, the fusion sparse image and the fusion low-rank image were superimposed to obtain the final fusion result. Experimental results showed that on the TNO dataset, compared with the five listed advanced methods, the five indicators of entropy, standard deviation, mutual information, sum of difference correlation, and multi-scale structural similarity increased by 2.43%, 4.68%, 2.29%, 2.24%, and 1.74%, respectively, when using the proposed method. For the RoadScene dataset, only two metrics, namely, the sum of the difference correlation and multi-scale structural similarity, were optimal. The other three metrics were second only to the GTF method. However, the image visualization effect was significantly better than the GTF method. Based on subjective evaluation and objective evaluation analysis, the proposed method can obtain high-quality fusion images, which has obvious advantages compared with the comparison method.

Optics and Precision Engineering
Apr. 10, 2023, Vol. 31 Issue 7 1085 (2023)
Pneumonia aided diagnosis model based on dense dual-stream focused network
Tao ZHOU, Xinyu YE, Huiling LU, Yuncan LIU, and Xiaoyu CHANG

X-ray images play an important role in the diagnosis of pneumonia disease, but they are susceptible to noise pollution during imaging, resulting in the imaging features of pneumonia being inconspicuous and an insufficient extraction of lesion features. A dense dual-stream focused network DDSF-Net is proposed in this paper for the development of an aided diagnosis model for pneumonia to address the abovementioned problems. The main steps of this method are as follows. First, a residual multi-scale block is designed, a multi-scale strategy is used to improve the adaptability of the network to different sizes of pneumonia lesions in medical images, and a residual connection is used to improve the efficiency of the network parameter transfer. Secondly, a dual-stream dense block is designed, a dense unit with a parallel structure for the global information stream and the local information stream is used, whereby the transformer learns global contextual semantic information. The convolutional layer performs local feature extraction, and a deep and shallow feature fusion of the two information streams is achieved using a dense connection. Finally, focus blocks with central attention operation and neighborhood interpolation operation are designed, background noise information is filtered by cropping the medical image size, and detailed features of lesions are enhanced by interpolating the medical images with magnification. In comparison with typical models used for a pneumonia X-ray dataset, the model introduced in this paper obtained better performance with a 98.12% accuracy, 98.83% precision, 99.29% recall, 98.71% F1, 97.71% AUC and 15729 s training time. Compared with DenseNet, ACC and AUC were improved by 4.89% and 4.69%, respectively. DDSF-Net effectively alleviates the problems of inconspicuous pneumonia imaging features and insufficient extraction of lesion features. The validity of this model and robustness of this paper are further verified by a heat map and three public datasets.

Optics and Precision Engineering
Apr. 10, 2023, Vol. 31 Issue 7 1074 (2023)
Terahertz spectral features detection and accuracy identification of explosives in high humidity environment
Ziwei ZENG, Shangzhong JIN, Hongguang LI, Li JIANG, and Junwei CHU

The fingerprint characteristics of the terahertz absorption spectrum of materials have been widely used in material identification, but the strong absorption of terahertz waves by water vapor in the actual atmospheric environment will cause the spectrum to oscillate severely; there will be increasing false, weak, and aliased peaks. These phenomena have seriously affected the accuracy of peak-finding comparison and the ability of substance identification. In spite of this, on the basis of extracting the terahertz absorption spectrum of explosives at relative humidity of 2%, 15%, 35%, 45%, and 60%, the continuous wavelet transform is expanded in the frequency domain to obtain a unique characteristic. Then, the network training is carried out on the frequency domain scale maps of explosives obtained under the above 5 different humidity conditions based on the deep learning method with the ResNet-50 network model as the basic network structure; the classification accuracy of the test can be up to 96.6%. To verify the effectiveness of the technology under untrained humidity samples, the time-domain signals of explosives at relative humidity of 50%, 55%, and 67% were fed into the identification system; the classification accuracy could reach 96.2%. Experiments show that a new terahertz material identification method, based on wavelet transform and ResNet-50 network classification training, greatly improves the accuracy of material identification in high humidity environment compared with the traditional peak-finding method. In addition, it avoids a series of complex preprocessing operations such as noise reduction and smoothing, and considerably expands the engineering adaptability of terahertz spectral detection technology. It provides help for accurate detection and identification of mines and other explosives in high humidity and extremely complex special operations environments such as mountains, forests, and depressions.

Optics and Precision Engineering
Apr. 10, 2023, Vol. 31 Issue 7 1065 (2023)
Sand-dust image enhancement using RGB color balance method
Yuan DING, and Kaijun WU

This paper proposes a dusty image enhancement algorithm based on the RGB color balance method to address the problem of color difference in low-contrast and low-definition outdoor images in dusty environments. The method mainly includes two tasks: color correction and contrast enhancement. First, in view of the particularity of the color distribution of dust images and the illumination mechanism assumed by the gray world algorithm, an RGB color balance method (RGBCbm) that maintains the mean value of the color component is proposed such that the RGB three-channel component is stretched according to the mean value of the color component, which effectively removes the color curtain problem caused by dust in images. The multi-scale retinal enhancement algorithm with color restoration (MSRCR) is further used to improve the color correction results. Subsequently, the relative global histogram stretching (RGHS) method combined with the Lab color model is used to enhance and correct the contrast, color, and brightness of the image. Test results and verification of the proposed algorithm on experimental data show that the algorithm can effectively solve the color difference problem in various dust-degraded images and enhance the clarity of image details while improving the color richness and contrast of the image. In the quantitative comparison with other advanced algorithms, the highest underwater image quality measure (UIQM) and image contrast index (Conl) reach 0.602 and 0.994, respectively, which are 0.140 and 0.018 higher than those of other algorithms.

Optics and Precision Engineering
Apr. 10, 2023, Vol. 31 Issue 7 1053 (2023)
Design of channel attention network and system for micro target measurement
Yangwei FU, Jin ZHANG, Zhenxi SUN, Rui ZHANG, Weishi LI, and Haojie XIA

Microdevices are widely used in the electronic industry. However, the diffraction effect, which causes misalignments in the physical and optical edges of micro devices, brings challenges to detection and measurement. To address this issue, this study combines image super-resolution reconstruction with target measurement to propose an image super-resolution reconstruction algorithm based on edge enhancement and build a corresponding measurement system. In this study, a new quality evaluation parameter is proposed for image super-resolution reconstruction, to prove the feasibility of super-resolution reconstruction in improving target measurement accuracy. Aiming at the target edge, a channel attention mechanism is also introduced into the network to enhance its ability to reconstruct the image edge. Finally, the target measurement system is designed and built, and experiments are carried out. The results show that the proposed algorithm can achieve higher peak signal to noise ratio (PSNR) and structural similarity (SSIM) values on an open dataset. In real-world measurements, this algorithm improved the limit resolution of the original measurement system by 25.9% and the target measurement accuracy by 51.6%, on average. This study provides a potential direction for the development of micro-target detection and measurement in industrial production.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 962 (2023)
Lightweight vehicle detection using long-distance dependence and multi-scale representation
Xiuping JING, and Ying TIAN

Vehicle detection based on deep learning plays a vital role in many fields. In recent years, it has presented a major development direction for computer vision. Lightweight vehicle detection includes the exploration of network structure and computing efficiency, and it is widely used in many fields such as intelligent transportation. However, challenges exist in different scenarios, such as large changes in vehicle scale in detection cameras and vehicles overlapping each other, which reduce the precision of the network in detecting vehicles. To solve these problems, this study proposes an improved YOLOv5s method for detecting vehicles. First, the study proposes to capture long-distance dependencies between objects through a visual attention network and apply new weights to the network’s original feature map to increase the adaptability of the network. These operations improve the anti-occlusion ability of the network. Second, the horizontal residual is constructedagain in the residual module. The output feature maps contain the same number and different sizes of receptive fields per module. Feature extraction occurs at a more fine-grained level, thereby enriching the multi-scale representation ability of the network. The experimental results show that the improved network provides 2.1% mAP performance on the Pascal visual object classes (VOC) vehicle telemetry dataset and a 1.7% mAP performance on the MS COCO vehicle telemetry dataset. The performance of the improved network is more powerful and its anti-occlusion ability is enhanced. Compared with the original network, the detection results are more competitive.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 950 (2023)
Bone scintigraphic classification method based on ACGAN and transfer learning
Hong YU, Renze LUO, Chunmeng CHEN, Xiang TANG, and Renquan LUO

Owing to the limited availability of samples and unbalanced categories of bone images, it is difficult to classify these images. To improve the classification accuracy of bone images, this study developed a bone-image classification method based on auxiliary classifier generative adversarial network (ACGAN) data generation and transfer learning. First, an multi-attention U-Net-based ACGAN (MU-ACGAN) model was designed to address the imbalance of bone-image categories. The model uses U-Net as the generator framework and combines dense residual connection and channel-spatial attention mechanism to improve the generation of bone-image detail features. The discriminator extracts bone-image features by using a dense residual attention convolution block for discrimination. Next, the amount of data was further expanded via combination with traditional data enhancement methods. Finally, a multi-scale convolutional neural network was designed to extract the features at different scales of bone imaging so as to improve the classification effect. In the model training process, a two-stage transfer learning method was adopted to optimize the initialization parameters of the model and address the problem of overfitting. Experimental results indicate that the classification accuracy of the proposed method reaches 85.71%, effectively alleviating the problem of low classification accuracy on small sample bone-image datasets.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 936 (2023)
Deep learning image denoising based on multi-stage supervised with Res2-Unet
Yan LIU, Gang CHEN, Chunyu YU, Shiyun WANG, and Bin SUN

To restore high quality images from different types of noise images, this study developed a multi-stage supervised deep residual (MSDR) neural network based on Res2-Unet-SE. First, using the neural network, the image denoising task was devised as a multi-stage process. Then, in each processing stage, image blocks with different resolutions were input into a Res2-Unet sub-network to obtain feature information at different scales, and an adaptive learning of the feature fusion information was transferred to the next stage through a channel attention mechanism. Finally, the feature information of different scales was superimposed to achieve high-quality image noise reduction. The BSD400 dataset was selected for training in the experiments, and a Gaussian noise reduction test was performed using the Set12 data set. Real noise reduction test was conducted using the SIDD data set. Compared with the common denoising neural network, the peak signal-to-noise ratios (PSNRs) of the proposed denoising convolutional neural network (DnCNN) improved by 0.03 dB, 0.05 dB, and 0.14 dB when Gaussian noises of σ = 15, 25 and 50, respectively, were added to the image data set. Compared with the latest dual residual block network (DuRN) algorithm, the PSNR of the image denoised using the proposed algorithm was higher by 0.06 dB, 0.57 dB, and 0.39 dB, respectively. For images containing real noise, the PSNR of the image denoised by the proposed algorithm was 0.6 dB higher than that by the convolutional blind denoising network (CBDNET) algorithm. The results indicate that the proposed algorithm is highly robust in the task of image denoising, and it can effectively remove noise and restore the details of an image, as well as fully maintain the global dependence of the image.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 920 (2023)
Vehicle detection method based on remote sensing image fusion of superpixel and multi-modal sensing network
Yuanfeng LIAN, Guangyang LI, and Shaochen SHEN

A remote sensing image vehicle detection method combining superpixels and a multi-modal perception network is proposed with the purpose of reducing recognition accuracy due to background interference, target density, and target heterogeneity in remote sensing image vehicle detection. First, based on the region merging rules of hybrid superpixels, the superpixel bipartite graph fusion algorithm was used to fuse the superpixel segmentation results of the two modalities, which improved the accuracy of the superpixel segmentation results of different modal images. Second, MEANet, a vehicle detection method of remote sensing images based on a multi-modal edge aware network, was proposed. An optimized feature pyramid network module was introduced to enhance the ability of the network to learn multi-scale target features. Finally, the two sets of edge features generated by the superpixel and multi-modal fusion module were aggregated through the edge perception module, and the accurate boundary of the vehicle target was generated. Experiments were conducted on the ISPRS Potsdam and ISPRS Vaihingen remote sensing image datasets, and the final scores were 91.05% and 85.11%, respectively. The experimental results showed that the method proposed in this study has good detection accuracy and good application value in high-precision vehicle detection of multi-modal remote sensing images.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 905 (2023)
Ship detection for complex scene images of space optical remote sensing
Xinwei LIU, Yongjie PIAO, Liangliang ZHENG, Wei XU, and Haolin JI

When deep-learning-based target detection algorithms are directly applied to the complex scene images generated by space optical remote sensing (SORS), the ship target detection effect is often poor. To address this problem, this paper proposes an improved YOLOX-S (IM-YOLO-s) algorithm, which uses densely arranged offshore ships with complex backgrounds and ships with multi-interference and small targets in the open sea as detection objects. In the feature extraction stage, the CA location attention module is introduced to distribute the weight of the target information along the height and width directions, and this improves the detection accuracy of the model. In the feature fusion stage, the BiFPN weighted feature fusion algorithm is applied to the neck structure of IM-YOLO-s, which further improves the detection accuracy of small target ships. In the training stage of model optimization, the CIoU loss is used to replace the IoU loss, zoom loss is used to replace the confidence loss, and weight of the category loss is adjusted, which increases the training weight in the densely distributed areas of positive samples and reduces the missed detection rate of densely distributed ships. In addition, based on the HRSC2016 dataset, additional images of small and medium-sized offshore ships are added, and the HRSC2016-Gg dataset is constructed. The HRSC2016-Gg dataset enhances the robustness of marine ship and small and medium-sized pixel ship detection. The performance of the algorithm is evaluated based on the dataset HRSC2016-Gg. The experimental results indicate that the recall rate of IM-YOLO-s for ship detection in the SORS scene is 97.18%, AP@0.5 is 96.77%, and the F1 value is 0.95. These values are 2.23%, 2.40%, and 0.01 higher than those of the original YOLOX-s algorithm, respectively. This indicates that the algorithm can achieve high quality ship detection from SORS complex background images.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 892 (2023)
Single heading-line survey of MGTS for magnetic target pattern recognition
Qingzhu LI, Zhining LI, Zhiyong SHI, and Hongbo FAN

The planar grid measurement of the magnetic gradient tensor system (MGTS) is often utilized for magnetic target recognition; however, it is difficult to measure, complicated to analyze, and requires high instrument precision. In this regard, we propose a magnetic target pattern recognition method based on MGTS single heading-line survey. First, the sensitivity of magnetization direction is analyzed for 15 attributes including the components, eigenvalues, and invariants of the magnetic gradient tensor (MGT). The more sensitive attributes are used to identify target postures, and the insensitive ones are for target shapes. Then, the time-domain signal characteristics of the measured quantities are extracted and the category labels are set. Principal component analysis (PCA) is employed to reduce dimensionality, visualize features, and determine the optimal dimension. Finally, the kernel extreme learning machine optimized by the sparrow search algorithm (SSA-KELM) is used to train and test the survey sample data. The pattern recognition of the magnetic target is hence realized. In the simulation, the recognition of 1) different magnetization direction categories of magnetic dipoles and 2) shape categories of geometric bodies such as the sphere, cuboid, and cylinder is 100% accurate. In the experiment, a total of 180 learning routes were measured for three types of magnets and their corresponding postures. Under the training:testing ratio of 6:4, the results of magnet posture and shape recognition were completely accurate.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 872 (2023)
Multi-object pedestrian tracking method based on improved high resolution neural network
Hongying ZHANG, Pengyi HE, and Xiaowen PENG

This study proposes an improved high-resolution neural network to address the issue of detection and tracking failures caused by target blockage in a multi-target pedestrian tracking process. First, to enhance the initial feature extraction capability of the network for pedestrian targets, a second-generation bottleneck residual block structure was introduced into the backbone of a high-resolution neural network, thus improving the receptive field and feature expression capability. Second, a new residual detection block architecture with a two-layer efficient channel attention module was designed to replace the one at the multi-scale information exchange stage of the original network, thus improving the test performance of the entire network system. Finally, the network was fully trained by selecting appropriate parameters, and subsequently, the algorithm was tested using multiple test sets. The test results indicated that the tracking accuracy of the proposed algorithm was 0.1%, 1.6%, and 0.8% higher than that of FairMOT on 2DMOT15, MOT17, and MOT20 datasets, respectively. In conclusion, the proposed algorithm-tracking stability for longer video sequences was greatly improved. Therefore, it can be applied to special scenarios with more targets and occlusion area.

Optics and Precision Engineering
Mar. 25, 2023, Vol. 31 Issue 6 860 (2023)
Combining residual shrinkage and spatio-temporal context for behavior detection network
Zhong HUANG, Mengyuan TAO, Min HU, Juan LIU, and Shengbao ZHAN

To solve the problems of high redundancy of behavior feature extraction and inaccurate localization of behavior boundary of R-C3D, an improved behavior detection network (RS-STCBD) based on residual shrinkage and spatio-temporal context is proposed. First, the residual shrinkage structure and soft threshold operation are integrated into the residual module of 3D-ResNet, and a unit of 3D residual shrinkage with channel-adaptive soft thresholds (3D-RSST) is designed. Moreover, multiple 3D-RSSTs are cascaded to construct a feature extraction network to adaptively eliminate redundant information such as noise and background in behavioral features. Second, instead of single convolution, multi-layer convolutions are embedded into the proposed subnet to increase the temporal dimension receptive field of the temporal proposal fragments. Finally, a non-local attention mechanism is introduced into the behavior classification subnet to obtain the spatio-temporal context information of behavior by capturing remote dependencies among high-quality behavior proposals. Experimental results on THUMOS14 and ActivityNet1.2 datasets show that the mAP@0.5 values of the improved network reach 36.9% and 41.6%, which are 8.0% and 14.8% higher than those of R-C3D, respectively. The behavior detection method based on the improved network, which increases the accuracy of behavior boundary localization and behavior classification, is beneficial and enhances the quality of human-robot interaction in natural scenes.

Optics and Precision Engineering
Feb. 25, 2023, Vol. 31 Issue 4 552 (2023)
Yolo v3-SPP real-time target detection system based on ZYNQ
Lili ZHANG, Zhen CHEN, Yuxuan LIU, and Lele QU

The target detection algorithm based on the convolutional neural network is developing rapidly, and with the increase in computational complexity, requirements for device performance and power consumption are increasing. To enable the target detection algorithm to be deployed on embedded devices, this study proposes a Yolo v3-SPP target detection system based on the ZYNQ platform by using a hardware and software co-design approach and hardware acceleration of the algorithm through FPGA. The system is deployed on the XCZU15EG chip, and the required power consumption, hardware resources, and performance of the system are analyzed. The network model to be deployed is first optimized and trained on the Pascal VOC 2007 dataset, and finally, the trained model is quantified and compiled using the Vitis AI tool to make it suitable for deployment on the ZYNQ platform. To select the best configuration scheme, the impact of each configuration on hardware resources and system performance is explored. The system power consumption (W), detection speed (FPS), mean value of average precision (mAP) for each category, output error, etc. are also analyzed. The experimental results show that the detection speed is 38.44 FPS and 177 FPS for Yolo V3-SPP and Yolo V3-Tiny network structures, respectively, with mAPs of 80.35% and 68.55%, on-chip power consumption of 21.583 W, and board power consumption of 23.02 W at 300 M clock frequency and input image size of (416,416). This shows that the proposed target detection system meets the requirements of embedded devices for deploying neural network models with low power consumption, real-time, and high detection accuracy.

Optics and Precision Engineering
Feb. 25, 2023, Vol. 31 Issue 4 543 (2023)
Quantitative evaluation method for structural similarity of multidimensional point cloud
Ziqian YANG, Yanqiu WANG, Fu ZHENG, and Zhibin SUN

Point cloud registration technology is the core technology of point cloud data processing. The quality of the point cloud will influence the registration effect of point cloud registration. An excellent quality point cloud can improve registration accuracy, spatial integrity, and slam performance. Therefore, assessing the quality of point cloud data has significant objective value. The point cloud data obtained by the sensor contains noise such as systematic and nonsystematic errors. In this case, point cloud data processing becomes crucial. However, there is no more objective method to evaluate the treatment effect. A quantitative evaluation method of multidimensional point cloud structure similarity is proposed. This method compares the point cloud data before and after filtering with the standard data. The mean, standard deviation, and covariance of all point coordinates on the three-dimensional coordinate axis are compared. Subsequently, the structural similarity values on the three coordinate axes are weighted. Finally, the similarity and correlation degree of the three-dimensional structure is obtained. Then, it realizes the evaluation of point cloud filtering, point cloud sparse, and point cloud data quality. The method is also verified by experiments to improve registration accuracy. Experiments demonstrate its capacity to evaluate the quality of the 3D point cloud. It can evaluate the quality of the point cloud obtained under different noise types and processing methods. It provides a reference for point cloud registration. This method improves both the accuracy and efficiency of cloud point registration, as well as its quality.

Optics and Precision Engineering
Feb. 25, 2023, Vol. 31 Issue 4 533 (2023)
Ship detection oriented to compressive sensing measurements of space optical remote sensing scenes
Shuming XIAO, Ye ZHANG, Xuling CHANG, and Jianbo SUN

The compressive sensing (CS)-based space optical remote-sensing (SORS) imaging system can simultaneously perform sampling and compression by using hardware at the sensing stage. The system must reconstruct the original scene during the ship detection task. The scene reconstruction process of CS is computationally expensive, memory intensive, and time-consuming. This paper proposes an algorithm named compressive sensing and improved you only look once (CS-IM-YOLO) for direct ship detection based on measurements obtained by the imaging system. To simulate the block compression sampling process of the imaging system, the convolution measurement layer with the same stride and convolution kernel size is used to perform the convolution operation on the scene, and the high-dimensional image signal is projected into the low-dimensional space to obtain the full-image CS measurements. After obtaining the measurements of the scene, the proposed ship detection network extracts the coordinates of the ship from the measurements. The squeeze-and-excitation Network (SENet) module is imported into the backbone network, and the improved backbone network is used to extract the ship feature information using the measurements. The feature pyramid network is used to enhance feature extraction while fusing the feature information of the shallow, middle, and deep layers, and then to complete predicting the ship's coordinates. CS-IM-YOLO especially connects the convolutional measurement layer and the CS based ship detection network for end-to-end training; this considerably simplifies the preprocessing process. We present an evaluation of the performance of the algorithm by using the HRSC2016 dataset. The experimental results show that the precision of CS-IM-YOLO for detection of ships via CS measurements in SORS scenes is 91.60%, the recall is 87.59%, the F1 value is 0.90, and the AP value is 94.13%. This demonstrates that the algorithm can perform accurate ship detection using the CS measurements of SORS scenes.

Optics and Precision Engineering
Feb. 25, 2023, Vol. 31 Issue 4 517 (2023)
Matching point pair optimization registration method for point cloud model
Yongwei YU, Kang WANG, Liuqing DU, and Bing QU

To address the problems of large registration errors and poor adaptability of the traditional iterative closest point (ICP) algorithm when point clouds overlap or partially overlap, an improved registration algorithm based on weighted optimization of matching point pairs is proposed. First, an improved voxel downsampling algorithm is proposed to sample point clouds, which reduces the amount of data and improves the robustness of the algorithm against noise. Then, the improved Sigmoid function is used to assign different weights to the matching point pairs participating in the registration, which overcomes the disadvantage of traditional algorithms that ignore matching point pairs with small distances still have wrong point pairs, while improves the registration accuracy and convergence speed. Finally, a method to solve registration parameters using singular value decomposition (SVD) is proposed to further improve registration accuracy. The registration and noise experiments with different overlapping degrees were performed, and the proposed algorithm was verified by combining the three-dimensional point cloud reconstruction of the crankshaft. The experimental results showed that, compared with the Tr-ICP and AA-ICP algorithms, the error in the proposed algorithm was reduced by approximately 34.1% and 29%, respectively. Further, the registration time was shortened by approximately 16.1% compared with the Tr-ICP algorithm. Hence, compared with traditional algorithms, the proposed algorithm has higher registration accuracy, better applicability, and robustness.

Optics and Precision Engineering
Feb. 25, 2023, Vol. 31 Issue 4 503 (2023)
Neural network-based computational holographic encryption image reconstruction scheme for chaotic iris phase mask
Tao HU, Xueru SUN, and Weimin JIN

To expand the decryption method of computational holographic encrypted images for a symmetric-asymmetric hybrid encryption system that cannot be easily attacked illegally, a scheme involving the use of a neural network to restore chaotic iris phase mask computational holographic encrypted images is proposed. First, a plaintext image is encrypted and a ciphertext image of a computational hologram is generated. Next, numerous ciphertext image pairs are generated as datasets. Subsequently, they are continuously trained and tested by using them to build a neural network. Results show that the trained neural network can fit the mapping relationship from the ciphertext image to the plaintext image and that the public or private key is no longer used to decrypt the ciphertext image during decryption. Additionally, the average cross-correlation coefficient is 0.984, the average peak signal-to-noise ratio is 61.0 dB, and the average structural similarity is 0.77, which indicate better performances compared with the performances of a plaintext image recovered by the neural network. By polluting the ciphertext image with noise, a higher quality image is obtained. The purpose of decrypting a ciphertext image via a neural network is achieved, and the scheme is shown to be feasible and robust.

Optics and Precision Engineering
Feb. 10, 2023, Vol. 31 Issue 3 417 (2023)
Defect detection of cylindrical surface of metal pot combining attention mechanism
Jian QIAO, Nengda CHEN, Yanxiong WU, Yang WU, and Jingwei YANG

To achieve the automatic and rapid detection and sorting of high-brightness reflection metal cylindrical pots, as well as break through the technical problems of slow speed and low efficiency of metal pot surface defect detection, a bi-directional feature pyramid network (BiFPN) was introduced in this study based on the YOLOX network. In addition, a lightweight feature fusion network model was devised on the basis of the attention mechanism, and the lightweight design of the computing model was realized. Meanwhile, the attention mechanism module was employed to learn the channel and space of feature information, effectively alleviating the semantic gap of multi-scale features and improving the detection precision of the model. Considering the unbalanced distribution of the learning weight of the network for difficult and easy classification samples, the classification loss function regarding the attenuation factor was determined. Comparisons of the feature fusion network, classification loss function, and attention mechanism module position ablation were conducted using the metal pot cylindrical surface defect dataset. The experimental results show that the fusion attention mechanism model can effectively identify six types of defects with different shapes, the average detection precision mAP0.5 of the test set realized 90.92%, and the detection frame rate was 30.84 FPS. Thus, cylindrical surface defects of metal pots can be identified and located, rapidly as well as with high precision, by using the proposed model.

Optics and Precision Engineering
Feb. 10, 2023, Vol. 31 Issue 3 404 (2023)
Small object detection based on GM-APD lidar data fusion
Dakuan DU, Jianfeng SUN, Yuanxue DING, Peng JIANG, and Hailong ZHANG

Geiger mode avanlanche photon diode (GM-APD) lidar has single photon detection sensitivity, which greatly reduces the system volume and power consumption. It makes the system feasible for practical application, and has become a hot topic in recent studies. However, owing to the limitation of the pixel number, the spatial resolution is low, which makes it difficult to obtain the clear contour of the remote target, and the object detection rate is not high. To solve this problem, a detection algorithm based on multi-level processing of the intensity and range images was proposed to find the correlation between the intensity images and point clouds’ features to improve the probability of small object detection. First, the improved feature pyramid network (FPN) combines the receptive field block (RFB) and convolutional block attention module (CBAM) with the feature extraction network to enhance the selection accuracy of intensity images. Second, the intensity and range images are combined into point clouds with intensity information in the candidate regions. Finally, a dynamic graph convolution network (DGCNN) is used to perform secondary detection on the target in the candidate regions. Moreover, point cloud information is used to further select the object in the candidate regions. In the GM-APD lidar long-range vehicle dataset, the AP of the network achieves 98.8%, and it has good robustness for complex scenes, such as incomplete vehicle structure, weak echo, and strongly reflected light spot. Compared with the SSD and YOLOv5, the detection accuracy of the network improved by 3.1% and 2.5%, respectively, which is feasible for lidar dim object detection.

Optics and Precision Engineering
Feb. 10, 2023, Vol. 31 Issue 3 393 (2023)
Defect detection of low-resolution ceramic substrate image based on knowledge distillation
Feng GUO, Xiaodong SUN, Qibing ZHU, Min HUANG, and Xiaoxiang XU

Ceramic substrate is a vital foundational material of electronic devices, and implementing defect detection for ceramic substrates using machine vision technology combined with deep learning strategies holds significant importance in ensuring product quality. Increasing the field of view of the imaging equipment to make simultaneous imaging of multiple ceramic substrates possible can significantly improve the detection speed of a ceramic substrate. However, it also results in decreased image resolution and subsequently reduces the accuracy of defect detection. To solve these problems, a low-resolution ceramic substrate defect automatic detection method based on knowledge distillation is proposed. The method utilizes the YOLOv5 framework to construct a teacher network and a student network. Based on the idea of knowledge distillation, high-resolution image feature information obtained by the teacher network is used to guide the training of the student network to improve the defect detection ability of the student network for low-resolution ceramic substrate images. Moreover, a feature fusion module based on the coordinate attention (CA) idea is introduced into the teacher network, enabling it to learn features that adapt to both high-resolution and low-resolution image information, thus better guiding the training of the student network. Finally, a confidence loss function based on the gradient harmonizing mechanism (GHM) is introduced to enhance the defect detection rate. Experimental results demonstrate that the proposed ceramic substrate defect detection method based on knowledge distillation achieves an average accuracy and average recall of 96.80% and 90.01%, respectively, for the detection of five types of defect-stain, foreign matter, gold edge bulge, ceramic gap, and damage-in low-resolution (224×224) input images. Compared with current mainstream object detection algorithms, the proposed algorithm achieves better detection results.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 3065 (2023)
CT and PET medical image fusion based on LL-GG-LG Net
Tao ZHOU, Xiangxiang ZHANG, Huiling LU, Qi LI, and Qianru CHENG

Multimodal medical image fusion plays a crucial role in clinical medical applications. Most of the existing methods have focused on local feature extraction, whereas global dependencies have been insufficiently explored; furthermore, interactions between global and local information have not been considered. This has led to difficulties in effectively addressing the complexity of patterns and the similarity between the surrounding tissue (background) and the lesion area (foreground) in terms of intensity. To address such issues, this paper proposes an LL-GG-LG Net model for PET and CT medical image fusion. Firstly, a Local-Local fusion (LL) module is proposed, which uses a two-level attention mechanism to better focus on local detailed information features. Next, a Global-Global fusion (GG) module is designed, which introduces local information into the global information by adding a residual connection mechanism to the Swin Transformer, thereby improving the Transformer's attention to local information. Subsequently, a Local-Global fusion (LG) module is proposed based on a differentiable architecture search adaptive dense fusion network, which fully captures global relationships and retains local cues, thereby effectively solving the problem of high similarity between background and focus areas. The model's effectiveness is validated using a clinical multimodal lung medical image dataset. The experimental results show that, compared to seven other methods, the proposed method objectively improves the perceptual image fusion quality evaluation indexes such as the average gradient (AG), edge intensity (EI), QAB/F, spatial frequency (SF), standard deviation (SD) and information entropy (IE) edge retention by 21.5%, 11%, 4%, 13%, 9%, and 3%, respectively, on average. The model can highlight the information of the lesion areas. Moreover, the fused image structure is clear, and detailed texture information can be obtained.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 3050 (2023)
Spatial information adaptive regulation and feature alignment for infrared methane instance segmentation
Zifen HE, Huizhu CAO, Yinhui ZHANG, and Hong ZHUANG

Conventional contact methane leak sensors suffer from a small detection range and low efficiency, but machine vision algorithms combined with non-contact infrared thermal imaging can make infrared methane instance segmentation possible at long distances and large ranges. This is a significant advantage for improving methane detection efficiency and ensuring personnel safety. However, the segmentation performance of infrared methane instances is limited by such problems as blurred contour and low contrast between the leaking methane gas and the background, and it can be affected by atmospheric flow factors. In response to these problems, an adaptive spatial information regulation and feature alignment network (AFNet) is proposed to segment infrared instances of methane leakage. First, to enhance the model’s feature extraction, an adaptive spatial information regulation module is proposed to endow the backbone network with adaptive weights for different scale residual blocks, which enrich the feature space extracted by the model. Second, to meet the requirements of foreground target positioning detection and contour segmentation under complex methane gas contours, a weighted bidirectional pyramid is designed to reduce the diffusion, loss of spatial location, and instance edge information in low-level features, which are caused by the top-down propagation of the feature pyramid. Finally, a prototype feature alignment module is designed to capture the semantic relationships between long-distance gas features, enriching the semantic information of the prototype and improving the quality of generated target masks to improve the methane instance segmentation accuracy. Experimental results show that the proposed AFNet model achieves AP50@95 and AP50 quantitative segmentation accuracies of 42.42% and 92.18%, which are improved by 9.79% and 6.18% compared with the original Yolact, respectively. In addition, the inference speed achieves 36.80 frames/s and meets the requirements of methane leakage segmentation. The experimental results validate the effectiveness and engineering practicality of the algorithm proposed for infrared methane leakage segmentation.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 3034 (2023)
Lightweight target detection network for UAV platforms
Dandan HUANG, Han GAO, Zhi LIU, Lintao YU, and Huiji WANG

A lightweight target detection network for application to unmanned aerial vehicle (UAV) platforms was proposed for solving the problems of large image-scale variation, small target size, and limited embedded computing resources on UAVs in UAV-side target detection. The network used YOLOv5 as the benchmark model. First, detection branches were used to solve the problem of scale variation. Then, a small-target detection metric based on a mixture of normalized Wasserstein distance and traditional IOU was used for solving the problem of inaccurate small-target detection. In addition, a C3_FN lightweight network structure combining FasterNet and C3 was employed to reduce the computational burden of the network and make it more suitable for UAV platforms. The performance of the algorithms was tested on a simulation platform and an embedded platform using the UAV target detection dataset VisDrone. The simulation platform test results indicate that the proposed network achieves improvements of 6.6% and 4.8% in the mAP0.5 and mAP0.5-0.95 metrics, respectively, compared with a benchmark network, and the inference time is only 45.9 ms. The detection results are superior to those of mainstream UAV target detection networks. The test results for the embedded device (NVIDIA Jetson Nano) indicate that the proposed algorithm can achieve high accuracy and near real-time detection performance with limited hardware resources.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 3021 (2023)
Polarization parameter partition optimization restoration method for underwater degraded image
Ronghua LI, Changye CAI, Shenghui ZHANG, Yunhe XU, and Haotian CAO

In real water environments, common imaging problems include contrast reduction, low definition, and information attenuation. The traditional estimation method involves estimating the polarization information of the entire image. In real underwater images, the target has complex polarization characteristics, the restoration effects of some target areas are poor, and even degradation occurs. In this study, a method of polarization parameter partition optimization restoration for water-degraded images was proposed. First, the connected domain of an object with high and low polarizations was extracted after two images were processed with orthogonal polarization by block contrast enhancement and guided filtering. Based on the pixel values in the polarization image, the extraction process of high and low polarization object regions was optimized. Second, the polarization of each object was estimated, which solved the problem of incorrect estimation of complex objects in traditional global estimation methods. Finally, the image of polarization degree of backscattered light was iteratively optimized to obtain the optimal selection. Experimental results show that the subjective visual quality of the image is improved significantly. In two initial experiments, the original light intensity maps under low turbidity are compared. The measurement of enhancement by entropy (EME) value of the objective evaluation index and the contrast increases by 554% and 528% on average, respectively. In a third set of experiments, in which a comparison of the original light intensity maps in an environment of low illumination and high turbidity was conducted, the EME value and contrast are improved by 379% and 956%, respectively. Three sets of natural image quality evaluation (NIQE) indices indicate the proposed method has good performance, and a more natural image is produced. Compared with the traditional method, the proposed method can effectively restore a turbid image, increase image contrast, weaken information attenuation, and achieve a better image sharpening effect.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 3010 (2023)
Indoor self-supervised monocular depth estimation based on level feature fusion
Deqiang CHENG, Huaqiang ZHANG, Qiqi KOU, Chen LÜ, and Jiansheng QIAN

Due to a high number of areas with low texture and lighting in complex indoor scenes, current self-supervised monocular depth estimation network models suffer from certain issues. These problems include imprecise depth predictions, noticeable blurriness around object edges in the predictions, and significant loss of details. This paper introduces an indoor self-supervised monocular depth estimation network model based on level feature fusion. First, to enhance the visibility of poorly lit areas and address the issue of pseudo planes deteriorating the model, the Mapping-Consistent Image Enhancement module was applied to process indoor images. This module simultaneously maintained brightness consistency. Subsequently, a novel self-supervised monocular depth estimation network model that incorporates the Cross-Level Feature Adjustment module was proposed, utilizing an attention mechanism. This module effectively fused multilevel feature information from the encoder to the decoder, enhancing the network's ability to utilize feature information and reducing the semantic gap between predicted depth and true depth. Finally, the Gram Matrix Similarity Loss function was introduced based on image style features, as an additional self-supervised signal to further constrain the network model. This addition enhanced the network’s depth prediction capabilities, leading to improved accuracy. Through training and testing on NYU Depth V2 and ScanNet indoor datasets, this paper achieves a pixel accuracy rate of 81.9% and 76.0%, respectively. The experimental results also include a comparative analysis with existing main indoor self-supervised monocular depth estimation network models. The network model proposed in this paper excels in preserving object edges and details, effectively enhancing the accuracy of predicted depth.

Optics and Precision Engineering
Oct. 25, 2023, Vol. 31 Issue 20 2993 (2023)
Fine-grained semantic segmentation network for enhancing local salient of laser point clouds
Kun ZHANG, Liting ZHANG, Xiaohong WANG, Yawei ZHUN, and Kunpeng ZHOU

Point cloud fine-grained semantic segmentation, that is, object component segmentation, has important applications in industrial production, such as manipulator control, intelligent assembly, and object detection. However, due to the scattered form of point cloud data, the geometric features at the boundary of object parts are not obvious and the calculation process is difficult, resulting in the low precision of fine-grained segmentation, which makes it difficult to meet the production needs. For point cloud segmentation at the component level, this paper proposes a fine-grained semantic segmentation network to enhance the local saliency of point clouds. In the network, the context information of local data is constructed to improve the precision of fine-grained segmentation. The network establishes an improved farthest-point sampling algorithm using geometric curvature to enhance the feature computing ability of a local data subset of the point cloud and to create a multiscale high-dimensional feature extractor for extracting the high-dimensional features of different scales. In the process of computing the point cloud features, seq2seq was used, the attention mechanism was introduced, and the high-dimensional features of different scales were fused to obtain the context information of fine-grained semantic segmentation. Finally, the fine-grained segmentation accuracy was improved, particularly for the segmentation effect at the boundary.The experimental results show that the overall intersection and merging ratio of this network on the ShapeNet part dataset achieves 85.2%, while the accuracy rate achieves 95.6%. The network also has a certain generalization ability. This method is of great significance in the fine-grained semantic segmentation of three-dimensional objects.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 288 (2023)
Real time semantic segmentation network of wire harness terminals based on multiple receptive field attention
Yanan GU, Ruyi CAO, Lishan ZHAO, Bibo LU, and Baishun SU

Recently, wire harnesses are widely used. The harness terminal, an important component of a harness, requires strict quality inspection. Therefore, to improve the accuracy and efficiency of harness terminal quality detection, a real-time semantic segmentation network using multiple receptive field (MRF) attention, called MRF-UNet, is proposed in this study. First, an MRF attention module is used as the basic module for network feature extraction, improving the feature extraction and generalization abilities of the model. Second, feature fusion is used to effect jump connections and reduce the computational load of the model. Finally, deconvolution and convolution are used for feature decoding to reduce the network depth and improve the algorithm's performance. The experimental results demonstrate that the mean intersection over union, mean pixel accuracy and dice coefficient of the MRF-UNet algorithm on the harness terminal test dataset are 97.54%, 98.83%, and 98.31%, respectively, and the reasoning speed of the model is 15 FPS. Compared with BiSeNet, UNet, SegNet, and other mainstream segmentation networks, the proposed MRF-UNet network exhibits more accurate and faster segmentation results for microscopic images of harness terminals, thus providing data support for the subsequent quality detection.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 277 (2023)
Gait recognition algorithm in dense occlusion scene
Yi GAO, and Miao HE

Gait recognition algorithms mainly rely on the contour sequence of pedestrian targets for feature extraction and recognition. In practical applications, pedestrians walk together, and the contour is easily occluded and interfered by other pedestrians, which significantly reduces the accuracy of gait recognition algorithm. To improve the robustness of gait recognition algorithm in dense occlusion scene, a deep-learning gait recognition algorithm based on unordered contour sequences is proposed. First, a simulation is conducted based on the Casia-B dataset, and the target contour simulation dataset for dense occlusion scene is established to verify the occlusion robustness of the algorithm. Second, a data augmentation method based on random binary expansion is proposed. However, owing to the limitations of horizontal pyramid pooling (HPP) structure in the area of gait recognition demonstrated through theory and experiment, a degenerated horizontal pyramid pooling (DHPP) structure is proposed. By combining the DHPP structure, CoordConv method, joint training, and pruning method, the perception ability of absolute position information in deep-learning features can be enhanced and the robustness of the algorithm for occlusion scene can be improved. In addition, the feature expression dimension of the target is reduced. The experimental results indicate that the proposed method is effective in improving the robustness of gait recognition algorithm.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 263 (2023)
TCS-YOLO model for global oil storage tank inspection
Xiang LI, Rigen TE, Feng YI, and Guocheng XU

As a critical strategic resource, crude oil plays a key role in many fields. In particular, it is important to the Chinese economy and military. In this study, we propose a target detection model called Transformer-CBAM-SIoU YOLO (TCS-YOLO) based on YOLOv5. The proposed model was implemented and trained to identify and classify oil storage tanks using the Jilin-1 dataset of optical remote sensing satellite images. The proposed model includes an additional C3TR layer based on the Transformer architecture to optimize the network, as well as a Convolutional Block Attention Module (CBAM) to add an attention mechanism to the network layers. Moreover, we adopt Scale-Sensitive Intersection over Union (SIoU) loss instead of Complete Intersection over Union (CIoU) as a positioning loss function. Experimental results showed that compared with YOLOv5, TCS-YOLO's model complexity (GFLOPs, Giga Floating Point of Operations) was reduced by an average of 3.13%. Furthermore, the number of parameters was reduced by an average of 0.88% and inference speed was reduced by an average of 0.2 ms, while mean average precision (mAP0.5) increased by 0.2% on average, and mAP0.5:0.95 increased by 1.26% on average. The proposed TCS-YOLO model was compared with the conventional YOLOv3, YOLOv4, YOLOv5, and Swin Transformer models, and TCS-YOLO exhibited more efficient characteristics. The TCS-YOLO model has universal feasibility for the target identification of global oil storage tanks. In combination with techniques to calculate the storage rates of identified oil tanks, this method can provide a technical reference for remote sensing data in the field of energy futures.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 246 (2023)
Parallel path and strong attention mechanism for building segmentation in remote sensing images
Jianhua YANG, Hao ZHANG, and Haiyang HUA

Building segmentation in remote sensing images is widely used in urban planning and military fields, and is a current focus of research in the remote sensing field. To solve the problems of large-scale changes between buildings, building occlusion, and similar building shadows and edges in remote sensing images, which result in low building segmentation accuracy, a convolutional neural network with parallel paths and strong attention mechanism was developed. The model was based on the idea of residual connections of a ResNet network, and used ResNet as the basic network to improve the network depth and convolution downsampling to obtain parallel paths to extract multi-scale features of buildings to reduce the influence of scale changes between buildings. A strong attention mechanism was then added to enhance the fusion effect of the multi-scale information and discrimination of different features, and suppress the influence of building occlusion and shadows. Finally, a pyramid space pooling module was added after the multi-scale fusion features to suppress the appearance of holes inside the building in the segmentation result and improve the segmentation accuracy. Experiments were conducted on the WHU and Massachusetts Buildings public datasets, and the segmentation results were quantitatively compared using four indicators, namely MIoU, recall, precision, and F1-score. In the Massachusetts Buildings dataset, MIoU reaches 72.84%, which is 1.46% higher than the MIoU obtained with ResUNet-a. Thus, the model effectively improved the segmentation accuracy of buildings in remote sensing images.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 234 (2023)
Adaptive optimization control method for overexposure of industrial camera
Wenlin WU, Xiaobo LIAO, Junzhong LI, Jun ZHOU, and Jian ZHUANG

Industrial cameras cannot clearly observe targets in real time in overexposed lighting conditions with sudden changes in brightness. An adaptive exposure control method is proposed to address this problem. First, the weighted average gray value of an image of a preset reference area is calculated, and then, the exposure value is calculated for the image. Next, a parameter control optimization method based on an improved "S" curve is designed to optimize and adjust the internal parameters. Finally, the optimal level of clarity is obtained with reference to the preset position. Experimental results show that the proposed method takes approximately 0.08 s to complete the entire camera adjustment process. Compared with those of the automatic exposure algorithm implemented on the camera hardware and an adaptive exposure algorithm based on image histogram features under the same conditions, the average standard deviation of the Laplacian of the images produced by the proposed algorithm is 54.3% and 20.6% greater, respectively. Therefore, the proposed algorithm can effectively enhance the adaptability of the optimized cameras under conditions of sudden changes in brightness and can be implemented in various practical applications.

Optics and Precision Engineering
Jan. 25, 2023, Vol. 31 Issue 2 226 (2023)
Polyp image segmentation based on multi-scale ResNeSt-50 aggregation network and message passing
Ping XIA, Guangyi ZHANG, Bangjun LEI, Yaobing ZOU, and Tinglong TANG

There boundary between colorectal polyps and normal tissues is not typically evident. Therefore, accurately locating polyp positions is challenging. This study developed a novel polyp image segmentation method based on a combination of multiscale ResNeSt-50 aggregation network and sequential tree-reweighted message passing (TRW-S). First, a multiscale ResNeSt-50 aggregation network with an encoding–decoding structure was constructed to improve the expressiveness of the network. The encoder of the network is cascaded by convolution module and four-level ResNeSt module to build the ResNeSt-50 backbone network, which realizes linear integration and communication between cross-channel information, ResNeSt-50 uses split attention to strengthen the performance of important channel groups and enhance the ability of the residual module to extract polyp image information. In the bottom three layers of the decoder, a multilayer receptive field block (RFB) was used to obtain multiscale information. Subsequently, the dense aggregation module was used to integrate the output. The decoding information was output by using a fast decoding method, which ensured consistent segmentation performance and reduced the number of parameters. Second, the test-time augmentation (TTA) module was used to improve the prediction accuracy and enhance the generalization ability of the network when generating predictive images. Finally, a sequential tree-reweighted message passing (TRW-S) algorithm based on Markov random fields was constructed to postprocess the predicted image output of the model. This helped achieve continuity of the segmentation edge and consistency within the segmentation region. The experimental results on Kvasir-SEG, an open-access dataset for gastrointestinal polyps images, show that our method achieved an mDice value of 91.6%, mIoU of 86.3%, Smeasure of 92.1%, and MAE of 2.3%,which are higher than those of the polyp segmentation algorithms based on U-NET, U-Net++, ResUNet, SFA, and PraNet. Test results on the unknown datasets ETIS-LaribPolypDB and ColonDB indicate that the proposed model affords improvements in the PraNet and mDice values by 16.4% and 7.7%, respectively. As regards the segmentation performance on the ETIS-LaribPolypDB dataset, the proposed model was found to be highly sensitive to small lesions. Thus, the proposed model exhibits excellent performance in terms of consistency of segmentation area, continuity of segmentation edge, sharpness of contour, and ability to capture small lesions. In addition, it exhibits good generalization ability in the case of unknown datasets.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2765 (2023)
Spatial-spectral Transformer for classification of medical hyperspectral images
Yuan LI, Xu SHI, Zhengchun YANG, Qijuan TAN, and Hong HUANG

The development of hyperspectral imaging (HSI) technology offers new avenues for non-invasive medical imaging. However, medical hyperspectral images are characterized by high dimensionality, high redundancy, and the property of “graph-spectral uniformity,” necessitating the design of high-precision diagnostic algorithms. In recent years, transformer modes have been widely applied in medical hyperspectral image processing. However, medical hyperspectral images obtained using various instruments and acquisition methods have significant differences; this considerably hinders the practical applications of existing transformer-based diagnostic models. To address the aforementioned issues, a spatial–spectral self-attention transformer (S3AT) algorithm is proposed to adaptively mine the intrinsic relations between pixels and bands. First, in the transformer encoder, a spatial–spectral self-attention mechanism, which is designed to obtain key spatial information and important bands on hyperspectral images from different viewpoints, is employed. Thus, the spectral–spectral self-attention obtained from different views is fused. Second, in the classification stage, the predictions from different views are fused according to the learned weights. The experimental result on in-vivo human brain and blood cell HSI datasets indicate that the overall classification accuracies reach 82.25% and 91.74%, respectively. This demonstrates that the proposed S3AT algorithm yields enhanced classification performance on medical hyperspectral images.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2752 (2023)
Weakly supervised video instance segmentation with scale adaptive generation regulation
Yinhui ZHANG, Weiqi HAI, Zifen HE, Ying HUANG, and Dongdong CHEN

Video instance segmentation is critical in multi-target perception and scene understanding in assisted driving. However, as weakly supervised video instance segmentation is often applied to bounding box annotations for network training, the segmentation accuracies of targets with large-scale dynamic ranges in traffic scenes are severely restricted. To address this issue, we propose a scale adaptive generation regulation weakly supervised video instance segmentation network (SAGRNet). First, a multi-scale feature mapping contribution dynamic adaptive control module is proposed to replace the original linear weighting. This enables placing the focus on the local position and global contour of the target by dynamically adjusting the contribution of different scale feature mapping information, which solves the problem of large-scale dynamic ranges caused by changes in the imaging distance between vehicles and pedestrians. Second, a target instance multi-fine-grained spatial information aggregation generation control module is constructed to regulate the feature maps of each scale using weight parameters, which are obtained by aggregating multi-fine-grained spatial information extracted based on different dilations. This module refines the instance boundary and improves the representation of cross-channel mask interaction information, effectively compensating for the lack of edge contour segmentation mask continuity caused by limited instance edge information. Finally, to alleviate the weak supervision derived from bounding box level annotations, orthogonal and color similarity losses are introduced to reduce the deviation between the model prediction mask and real bounding box and to address the pixel-wise label attribute classification ambiguity problem. Experimental results on a traffic scene dataset extracted from Youtube-VIS2019 indicate that the SAGRNet improves the mean accuracy by 5.1% to 38.1% compared with the weakly supervised baseline. These results prove that our method provides an effective theoretical basis for multi-target perception and instance level scene understanding.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2736 (2023)
Joint self-attention and branch sampling for object detection on drone imagery
Yunzuo ZHANG, Cunyu WU, Yameng LIU, Tian ZHANG, and Yuxin ZHENG

Object detection on drone imagery is widely used in many fields. However, due to the complexity of the image background, the dense small objects and the dramatic scale changes, the existing object detection on drone imagery methods are not accurate enough. In order to solve this problem, we propose an accurate object detection method for drone imagery joint self attention and branch sampling. Firstly, a nested residual structure integrating self attention and convolution is designed to achieve the effective combination of global and local information, which makes the model to focus on the object area and ignore invalid features. Secondly, we design a feature fusion module based on branch sampling to mitigate the loss of object information. Finally, an improved detector for small objects is added to alleviate the problem of sharp scale changes. Furthermore, we propose a feature enhancement module to obtain more discriminative small object features. The experimental results show that the proposed algorithm performs well in various scenarios. Specifically, the mAP50 and mAP of the s model on the VisDrone2019 reached 59.3% and 37.1% respectively, an increase of 5.6% and 5.4% compared with the baseline. The mAP50 and mAP on the UAVDT reached 44.1% and 24.9% respectively, an increase of 5.8% and 3.2% compared with the baseline.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2723 (2023)
Uniform defocus blind deblurring based on deeper feature-based wiener deconvolution
Chengxi WANG, Chen LUO, Jianghao ZHOU, Lang ZOU, and Lei JIA

In industrial precision manufacturing, the small field depths of the imaging systems of visual inspection equipment can make them susceptible to defocus blurring. This significantly degrades their detection effect. To address this issue, this paper proposes a uniform defocus blind deblurring network (UDBD-Net). First, a uniform defocus blur kernel estimation net for extracting the characteristics of out-of-focus blurring and accurately estimating the blur kernel is proposed. Second, a non-blind deconvolution network, which is used for learning and estimating the unknown quantity in the feature-based Wiener deconvolution (FWD) formula so as to accurately generate the latent features of blurred images, is presented. Finally, the use of an encoder–decoder net to enhance the details of the recovered image and remove the artifacts is detailed. The experimental results indicate peak signal-to-noise ratio (PSNR) values of 31.16 dB and 36.16 dB for UDBD-Net on the images of DIV2K and GOPRO test sets, respectively. Compared with extant blind deblurring methods, the proposed method can restore deblurred images with higher quality and more naturalness without significantly increasing the model inference time. Furthermore, UDBD-Net can achieve a good deblurring effect on real uniformly defocused blurred images and can considerably improve the detection effect of industrial vision detection algorithms on such images.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2713 (2023)
Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation
Liming LIANG, Anjun HE, Renjie LI, and Jian WU

To address the problem of large-scale variation, blurred boundaries, irregular shapes, and low contrast with normal tissues in colon polyp images, which leads to the loss of edge detail information and mis-segmentation of lesion areas, we propose a cross-dimensional and cross-scale adaptive transformer segmentation network. First, the network uses transformer encoders to model the global contextual information of the input image and analyze the colon polyp lesion areas at multiple scales. Second, the channel attention and spatial attention bridges are used to reduce channel dimension redundancy and enhance the model's spatial perception ability while suppressing background noise. Third, the multi-scale dense parallel decoding module is used to bridge the semantic gaps between cross-scale feature information at different layers, effectively aggregating multi-scale contextual features. Fourth, a multi-scale prediction module is designed for edge details, guiding the network to correct boundary errors in a learnable manner. The experimental results conducted on the CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS datasets showed that the Dice similarity coefficients are 0.942, 0.932, 0.811, and 0.805, and the average intersection-over-union ratios are 0.896, 0.883, 0.731, and 0.729, respectively. The segmentation performance of our proposed method was better than that of existing methods. The simulation experiment showed that our method can effectively improve the mis-segmentation of colon polyp lesion areas and achieve high segmentation accuracy, providing a new approach for colon polyp diagnosis.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2700 (2023)
Global and local feature fusion image dehazing
Xin JIANG, Haitao NIE, and Ming ZHU

Convolution operations with parameter sharing features primarily focus on the extraction of local features of images but fail to model the features beyond the range of the receptive field. Moreover, when the parameters of an entire image share the same convolution kernel, the characteristics of different regions are ignored. To address this limitation in existing methods, a global and local feature fusion dehazing network is proposed. We utilize transformer and convolution operations to extract global and local feature information from images, respectively. Subsequently, we merge and output these features, effectively employing the advantages of transformers in modeling long-distance dependencies and the local perception of convolution operations, thus achieving efficient feature expression. Before the final output of restored images, we incorporate an enhancement module that includes multi-scale patches to further aggregate global feature information and enhance the details of the restored images using a transformer. Simultaneously, we introduce a global positional encoding generator, which can adaptively generate positional encodings based on the global content information of images, thereby enabling 2D spatial location modeling of the dependency relationship between pixels. Experimental results demonstrate the superior performance of the proposed dehazing network on both synthetic and real image datasets, producing more realistic restored images and significantly reducing detail loss.

Optics and Precision Engineering
Sep. 25, 2023, Vol. 31 Issue 18 2687 (2023)
Lightweight deep learning network for accurate localization of optical image components
Xiaoming NIU, Li ZENG, Fei YANG, and Guanghui HE

Precise optical image localization is crucial for improving industrial production efficiency and quality. Traditional image processing and localization methods have low accuracy and are vulnerable to environmental factors such as lighting and noise in complex scenes. Although classical deep learning networks have been widely applied in natural-scene object detection, industrial inspection, grasping, defect detection, and other areas, directly applying pixel-level precise localization to industrial components is still challenging owing to the requirements of massive data training, complex deep learning models, and redundant and imprecise detection boxes. To address these issues, this paper proposes a lightweight deep learning network approach for pixel-level accurate localization of component optical images. The overall design of the network adopts an Encoder–Decoder architecture. The Encoder incorporates a three-level bottleneck cascade to reduce the parameter complexity of feature extraction while enhancing the network’s nonlinearity. The Encoder and Decoder perform feature layer fusion and concatenation, enabling the Encoder to obtain more high-resolution information after upsampling convolution and to reconstruct the original image details more comprehensively. Finally, the weighted Hausdorff distance is utilized to establish the relationship between the Decoder's output layer and the localization coordinates. Experimental results demonstrate that the lightweight deep learning localization network model has a parameter size of 57.4 kB, and the recognition rate for localization accuracy less than or equal to 5 pixels is greater than or equal to 99.5%. Thus, the proposed approach satisfies the requirements of high localization accuracy, high precision, and strong anti-interference capabilities for industrial component localization.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2611 (2023)
Lightweight deep global-local knowledge distillation network for hyperspectral image scene classification
Yingxu LIU, Chunyu PU, Diankun XU, Yichuan YANG, and Hong HUANG

To address the challenges of the complex spatial layouts of target scenes and inherent spatial-spectral information redundancy of HSIs, an end-to-end lightweight deep global–local knowledge distillation (LDGLKD) method is proposed herein. To explore the global sequence properties of spatial-spectral features, the vision transformer (ViT) is used as the teacher to guide the lightweight student model for HSI scene classification. In LDGLKD, pre-trained VGG16 is selected as the student model to extract local detail information. After collaborative training of ViT and VGG16 through knowledge distillation, the teacher model transmits the learned long-range contextual information to the small-scale student model. By combining the advantages of the two models through knowledge distillation, the optimal classification accuracy of LDGLKD on the Orbita HSI scene classification dataset (OHID-SC) and hyperspectral remote sensing dataset for scene classification (HSRS) reached 91.62% and 97.96%, respectively. The experimental results revealed that the proposed LDGLKD method presented good classification performance. In addition, the OHID-SC based on the remote sensing data obtained by the Orbita Zhuhai-1 satellite could reflect the detailed information of land cover and provide data support for HSI scene classification.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2598 (2023)
Multidimensional attention mechanism and selective feature fusion for image super-resolution reconstruction
Jian WEN, Jianfei SHAO, Jie LIU, Jianlong SHAO, Yuhang FENG, and Rong YE

To address the problems of poor extraction of low-resolution features and blurred edges and artifacts caused by the high loss of high-frequency information in an image super-resolution reconstruction process, this paper proposes an image super-resolution reconstruction method that combines multidimensional attention and selective feature fusion (SKFF) as an image feature extraction module. The network comprises several basic blocks and residual operations to construct the feature extraction structure of the model, the core of which is a heterogeneous group convolution block for extracting image features. The symmetric group convolution block of this module performs convolution in a parallel manner to extract the internal information between different feature channels and performs selective feature fusion. The complementary convolution block captures the missed contextual information from the null domain, input–output dimension, and kernel dimension by full-dimensional dynamic convolution (ODconv). The features obtained after the symmetric group convolution and complementary convolution block processes are connected via a feature-enhanced residual block to remove useless information causing interference by redundancy. The rationality of the model design is demonstrated through five ablation experiments. Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) quantitative data comparison with other mainstream super-resolution reconstruction methods on the Set5, Set14, BSDS100, and Urban100 test sets are improved, especially on the Set5 dataset with an amplification factor of 3, showing a 0.06 dB improvement over the CARN-M algorithm. The experimental results demonstrate that the proposed model has better performance indexes and visual effects.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2584 (2023)
Improved Retinex low-light image enhancement for stage scenes
Yuan JI, Xingyi LI, Xinde MA, and Liang LIAO

To address the problem of limited luminance dynamic range in stage scenes, this study proposes a method for low-light enhancement of the stage based on an improved Retinex algorithm. First, the low-light image of the stage scene is enhanced using the improved Retinex algorithm to obtain an overall enhanced image. Then, the original image is fused with the enhanced image, and background areas that are over-enhanced and do not need to be enhanced are processed to obtain the final image. The improved Retinex algorithm uses a Gauss-Laplace high-pass filter to find the reflected and illumination components, thus addressing the problem of detail loss in the reflection component. It then performs contrast and detail enhancement on the reflected component and multiplies it with the light component to produce the enhanced image. This method performs field-programable gate array (FPGA) hardware platform verification based on software platform verification. The experimental results show that, compared with other classical methods, this method yields a noticeable visual improvement, with an average increase of 57.06% in peak signal-to-noise ratio (PSNR) and 27.34% in structural similarity (SSIM) in different stage scenes. This improvement is particularly significant in stage scenes with substantial differences in brightness between light and dark areas. The images processed by this method restore the true luminance dynamic range of the stage and exhibit good natural color saturation without distortion, ensuring better image quality.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2573 (2023)
Ground point cloud segmentation based on local threshold adaptive method
Peixiang ZHANG, Qi WANG, Renjing GAO, Yang XIA, and Zhenzhong WAN

The LIDAR point cloud ground segmentation algorithm in the autonomous driving sensing module has low segmentation accuracy that requires further improvement. To address this problem, a ground point cloud segmentation algorithm is proposed based on a seed point distance threshold and road fluctuation weighted amplitude adaptive approach. Firstly, the method establishes a correlation between the selection threshold of seed points and the horizontal distance feature of the two-dimensional plane based on polar coordinate raster map division and controls the update of the seed point set through the change in horizontal distance between point clouds. Subsequently, in the process of road model fitting, the slope continuity judgment criterion is introduced to solve the stagnation problem of the slope pavement model update. Finally, the segmentation threshold equation of point clouds is established according to the change in the weighted amplitude of road surface fluctuation. This enables the achievement of adaptive threshold segmentation with respect to the distance feature of point clouds. In this paper, point cloud binary classification data processing on the open-source dataset Semantic KITTI is performed, and the performance of the algorithm is tested. The experimental results demonstrate that the ground segmentation algorithm described in this paper exhibits an improvement of 2%-4% in precision and recall when compared to existing algorithms. This substantiates the high accuracy of the algorithm proposed in this study.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2564 (2023)
Partial optimal transport-based domain adaptation for hyperspectral image classification
Bilin WANG, Shengsheng WANG, and Zhe ZHANG

Hyperspectral image classification is a major task in remote sensing data processing. To solve the problem of inconsistent distribution of labeled source and unlabeled target domains, an unsupervised domain adaptive method based on partial optimal transport is proposed to achieve pixel-level classification of hyperspectral ground objects under different data distributions. Specifically, a deep convolution neural network is used to map the sample to the potential high-dimensional space, and the sample transportation scheme is established based on the partial optimal transport theory to minimize the distribution discrepancy between domains. Class-aware sampling and the mass factor adaptive adjustment strategy are used to promote the class alignment between domains and establish a global optimal transport. Experiments were conducted on two open-source hyperspectral image datasets, and the classification accuracies were compared quantitatively from the three evaluation matrices of overall accuracy (OA, %), average accuracy (AA, %), and Kappa (×100). Compared with the source-only method, the improved classification accuracies with the proposed method for OA and AA were 2.21% and 2.75%, respectively, and compared with the original optimal transport, the improved accuracies were 1.71% and 2.01%, respectively. These results show that the proposed model can effectively improve pixel-level classification accuracy in hyperspectral images.

Optics and Precision Engineering
Sep. 10, 2023, Vol. 31 Issue 17 2555 (2023)
Remote sensing multi-scale object detection based on multivariate feature extraction and characterization optimization
Yuebo MENG, Fei WANG, Guanghui LIU, and Shengjun XU

Remote sensing objects have large scale differences. In order to solve the problems that they are prone to lead to difficulties in fine granularity multi-scale feature extraction and weak prediction part of effective representation under complex background interference, a multi-scale remote sensing object detection method (MFC) for multivariate feature extraction and characterization optimization based on the idea of anchor-free is proposed. In the feature extraction part, a multivariate feature extraction module (MFE) is designed to mine multi-scale features at the fine granularity level, expand the receptive field through grouping operation and cross group connection, enhance the combination effect of multiple feature scales, and further strengthen the focus on small objects by combining context information; The deep and shallow features are fully integrated by the deep layer aggregation structure to obtain a more comprehensive feature expression. In the prediction part, a characterization optimization strategy (COS) is proposed, which uses elliptical mapping to optimize tags to adapt to remote sensing targets with large aspect ratio. And a Coordinate-Pixel attention is designed to focus on multi-scale object channels, positions and pixel information, reduce complex background interference, and make effective information prominent. Ablation and contrast experiments were conducted on DIOR, HRRSD and RSOD datasets. The experimental results showed that the mAP of MFC model reached 70.9%, 90.2% and 96.9% respectively, which was superior to most existing methods. It effectively improved the problems of false detection and missing detection, and had strong adaptability and robustness.

Optics and Precision Engineering
Aug. 25, 2023, Vol. 31 Issue 16 2465 (2023)
Review of multi-view stereo reconstruction methods based on deep learning
Huabiao YAN, Fangqi XU, Lü'er HUANG, Cibo LIU, and Chuxin LIN

The goal of Multi-view stereo (MVS) Reconstruction is to reconstruct a 3D model of a scene based on a set of multi-view images with known camera parameters, which is a mainstream method of 3D reconstruction in recent years. This paper provides a algorithm evaluation comparison for the latest hundreds of MVS methods based on deep learning. First, we sorted out the existing supervised learning-based MVS methods according to the reconstruction process of feature extraction, cost volume construction, cost volume regularization and depth regression, focusing on the summary of improvement strategies in the two stages of cost volume construction and cost volume regularization. For the unsupervised MVS methods, we mainly analyzed the design of the loss terms of each algorithm. It is classified according to its training mode. Secondly, we summarized the common datasets of MVS methods and their corresponding performance evaluation indexes, and further studied the introduction of strategies such as feature pyramid network, attention mechanism, coarse-to-fine strategy on the performance of MVS networks. In addition, it introduced the specific application scenarios of MVS methods, including digital twin, autonomous driving, robotics, heritage conservation, bioscience and other fields. Finally, we made some suggestions for the improvement direction of MVS methods, and also discussed the future technical difficulties and the research directions of MVS 3D reconstruction.

Optics and Precision Engineering
Aug. 25, 2023, Vol. 31 Issue 16 2444 (2023)
Multi-stage frame alignment video super- resolution network
Sen WANG, Yang ZHU, Yinhui ZHANG, Qingjian WANG, and Zifen HE

Video-Super Resolution (VSR) aims to reconstruct low-resolution video frame sequences into high-resolution video frame sequences. Compared with single image super-resolution, VSR usually relies on the height-dependent information of neighboring frames to reconstruct the current frame because of the added information of temporal dimension. How to align adjacent frames and obtain highly correlated information between frames is the key issue of VSR task. In this paper, the VSR task is divided into three stages: deblurring, alignment, and reconstruction. In the deblurring stage, the current frame is pre-aligned with adjacent frames to obtain feature information highly related to the current frame, and the details of the current frame are enhanced to achieve more feature information extraction in the initial stage. In the alignment stage, the highly correlated information in adjacent frames is used to further strengthen the feature information in the current frame by performing a secondary alignment operation on the input features. In the reconstruction stage, raw low-resolution frames are aggregated to provide more feature information at the end of the network. In this paper, we use Multi-Layer Perceptron (MLP) instead of the traditional convolution operation to construct a feature extraction module, and also perform a secondary alignment of the generated feature information to refine the image features to obtain better video frame reconstruction results. The experimental results show that the proposed algorithm achieves a higher accuracy of video frame sequence reconstruction on a variety of publicly available datasets while achieving a lower number of network parameters and a more coherent video sequence reconstruction performance.

Optics and Precision Engineering
Aug. 25, 2023, Vol. 31 Issue 16 2430 (2023)
Superresolution reconstruction of infrared polarization microscan images in focal plane
Yizhe MA, Shiyong WANG, Teng LEI, Bohan LI, and Fanming LI

The imaging resolution of a polarization detection system in the focal plane is lower than the actual resolution of the detector owing to the influence of the detector structure. In this study, micro-scanning is used to obtain the micro-displacement frame sequence without changing the optical system structure, and an improved convex set projection (POCS) algorithm is proposed to improve the imaging resolution of the polarization imaging system. In the algorithm, the obtained polarization microscan image sequences were first separated through angle detection, and the same group of angle detection image sequences were used as the input. Second, displacement matching and convex set projection iteration were conducted to initially reconstruct high-resolution images. Thereafter, the images were grouped into sliding window non-neighborhood clustering, and the dimensionality of the clustered images was reduced through principal component analysis. Finally, each one-dimensional information was regarded as a time sampling function, and soft threshold denoising was conducted in the wavelet domain. Experiments demonstrate that this algorithm can effectively improve the anti-noise performance of the conventional POCS algorithm, improve the imaging resolution of the high-resolution focal plane polarization detection system, increase the structural similarity coefficient by 0.02, increase the peak signal-to-noise ratio by 1 dB compared with similar algorithms, and achieve higher noise robustness.

Optics and Precision Engineering
Aug. 25, 2023, Vol. 31 Issue 16 2418 (2023)
Equipment fault dataset amplification method combine 3D model with improved CycleGAN
Baoping LI, Hengyi QI, Manli WANG, and Po WEI

The performance of deep-learning-based equipment fault detection systems relies heavily on the size and class diversity of the sample set. Because it is difficult to collect all types of fault sample comprehensively in industrial production, there is a demand for sample set augmentation. A fault dataset amplification method combining 3D modeling with an improved cycle generative adversarial network (CycleGAN) is proposed. First, various equipment malfunction images generated by 3D modeling software are applied to the CycleGAN network training to guide it in generating pseudo-real images to address the problem of insufficient samples and an uneven distribution. Second, a U-ResNet generator is used in the CycleGAN network to solve the problem of edge blurring and gradient vanishing during network training. The method was applied to the task of belt conveyor deviation detection. The experimental results show that the contour structure of the method converges quickly in the training process and has good timeliness in comparison with other amplification methods. The accuracy rate of the method is 98.1% when applying to the target detection network, which is 4.5% higher than that of the original real dataset. It meets the basic requirements of a balanced distribution of amplified datasets and high image quality.

Optics and Precision Engineering
Aug. 25, 2023, Vol. 31 Issue 16 2406 (2023)
Review of deep learning-based algorithms for ship target detection from remote sensing images
Zexian HUANG, Fanlu WU, Yao FU, Yu ZHANG, and Xiaonan JIANG

The detection of naval targets is a key area of research interest in the field of remote sensing image processing and pattern recognition. Moreover, the automatic detection of naval targets is crucial to both civil and military applications. In this study, we discuss and analyze the advantages and disadvantages of typical deep-learning-based target-detection algorithms, compare and summarize them, and summarize state-of-the-art deep-learning-based ship target detection methods. We also provide a detailed introduction to five aspects of state-of-the-art ship target detection methods, including multi-scale detection, multi-angle detection, small target detection, model light-weighting, and large-format wide remote sensing imaging. We also introduce the common evaluation criteria of ship target recognition algorithms and existing ship image datasets, and discuss the current problems faced by ship target detection algorithms using remote sensing images and future development trends in the field.

Optics and Precision Engineering
Aug. 10, 2023, Vol. 31 Issue 15 2295 (2023)
Simulating primary visual cortex to improve robustness of CNN neural network structures
Lijuan ZHANG, Mengda HU, Ziwei ZHANG, Yutong JIANG, and Dongming LI

The robustness of convolutional neural network (CNN) models is usually improved by deepening the number of network layers to ensure the accuracy of the results. However, increasing the number of network layers will make the network more complex and occupy more space. This paper proposes an improved CNN modeling method based on human visual features. Through the CNN, the structural features of human vision are fused to improve the robustness of the network against noise without increasing the number of layers or affecting the original accuracy of the model. The experimental results on the Cifar10 dataset show that the classification accuracy of the image inserted into the proposed VVNet is almost the same as that of the original network, and the classification accuracy is improved by approximately 10% in the case of image destruction. Compared with the original deep learning network, the network based on human visual system structure can effectively enhance the robustness of the network while maintaining the original accuracy.

Optics and Precision Engineering
Aug. 10, 2023, Vol. 31 Issue 15 2287 (2023)
Image super-resolution reconstruction based on attention and wide-activated dense residual network
Qiqi KOU, Chao LI, Deqiang CHENG, Liangliang CHEN, Haohui MA, and Jianying ZHANG

To address the problem of the blurring of the texture details of reconstructed images due to the insufficient utilization of global and local high- and low-frequency spatial information, this paper proposes an image super-resolution reconstruction model based on attention and a wide-activated dense residual network. First, four parallel convolution kernels with different scales are used to fully extract the low-frequency features of the image as the prior information for spatial feature transformation. Second, a wide-activated residual block fused with attention is constructed in the deep feature mapping module, and the low-frequency prior information is used to guide the extraction of the high-frequency features. In addition, the wide-activated residual block extracts deeper feature maps by expanding the number of feature channels before the activation function. As a result, the constructed global and local residual connections not only strengthen the forward propagation of the residual blocks and network features, but also enrich the diversity of the extracted features without increasing the number of parameters. Finally, the feature map is upsampled and reconstructed to obtain a clear high-resolution image. the experimental results show that compared with the LatticeNet model, the peak signal-to-noise ratio of the proposed algorithm is improved by 0.14 dB, and the structural similarity is improved by 0.001 at 4× super resolution on the BSD100 dataset. In addition, the local texture details of the reconstructed image are also clearer in subjective visualization.

Optics and Precision Engineering
Aug. 10, 2023, Vol. 31 Issue 15 2273 (2023)
Light spot detection of diamond wire based on deep learning
Zongqiang FENG, Yipeng YING, Fujun ZHANG, Yongbo YU, and Yi LIU

Break detection is an important part of the diamond wire production process. To address the problems of low sensitivity and lag feedback of existing contact detection, a non-contact wire break detection method is proposed based on machine vision detection of light spots reflected by diamond lines under strong light. Here, the study addresses the limitations of complex spot detection operation and ease of influence from external illumination of traditional image processing by investigating spot target detection on the embedded platform of diamond line spot detection using deep learning. A variety of Yolo-type models were trained and deployed. The problem of poor real-time detection in embedded devices due to the deep network level and large model volume of the original model was also addressed through a lightweight target spot detection model MCA-Yolox based on Yolox. The MobileNetV3 lightweight feature extraction network was used to replace the backbone feature extraction network of the Yolox model, and the model was lightweighted. Then, the enhanced feature extraction network was lightweight using the deep separable convolution and inverted residual structures. Then, combined with the CA attention mechanism, the detection accuracy of the lightweight model was improved. Finally, the improved model was deployed on the embedded platform. The experimental results show that the size and computation amount of the improved model, MCA-Yolox, are reduced to less than 1/3 those of the Yolox model, and compared with Yolox-Tiny and Yolov4-Tiny of the same scale, it has higher detection accuracy. The mAP of the model increased by more than 1%, and the detection speed can reach 30 frames/s after accelerated optimization. In summary, this paper presents a complete industrial detection scheme based on deep learning to detect diamond wire breaks.

Optics and Precision Engineering
Aug. 10, 2023, Vol. 31 Issue 15 2260 (2023)
Fine 3D reconstruction methods with Gaofen-7 satellite stereo images
Danchao GONG, Songlin LIU, Yilong HAN, and Wei ZHANG

To achieve high-precision and fine 3D reconstruction with the Gaofen-7 satellite sub-meter-level images, this paper proposes a method that focuses on the relative error correction of stereo pairs, image horizontal plane correction, and semi-global matching optimization, forming a fine 3D reconstruction pipeline. First, concerning the relative error in the orientation model of Gaofen-7 stereo images, the geometric constraint relationship of the connection point among images is used to eliminate the systematic error of the rational function model. Second, a horizontal correction method based on the projection plane of the object is used to correct the original image; this eliminates the large inclination error difference between stereo images and provides a better data basis for subsequent processes. In the dense matching stage, global publicly available digital elevation model (DEM) data are used as the disparity constraint. AD-Census, which consider both the grayscale and feature information of the image, is employed as the matching cost metric, addressing the matching error problem caused by repeated texture and improving the digital surface model (DSM) production. The results of experiments conducted using Gaofen-7stereo images covering areas in Ningxia and Xinjiang indicate that the proposed method can improve the relative error accuracy from 0.847 and 0.725 pixels to 0.652 and 0.593 pixels, respectively, representing up to 23.02% improvement. The horizontal correction method based on the projection plane of the object can significantly eliminate the geometric distortion caused by the difference in large inclination angle, and good-quality DSM products can be obtained, especially for the repetitive texture of small-scale dense building areas.

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2147 (2023)
Image reconstruction based on deep compressive sensing combined with global and local features
Yuanhong ZHONG, Qianfeng XU, Yujie ZHOU, and Shanshan WANG

Effectively restoring the original signal with high probability and high quality from a very small number of measured values is the core issue of compressive sensing for image reconstruction. Researchers have successively proposed traditional and deep learning-based compressive sensing image reconstruction algorithms. The traditional algorithms are based on mathematical derivation. Although they are comprehensible, their reconstruction quality is relatively poor. On the contrary, deep learning-based algorithms have relatively high reconstruction quality, but they cannot guarantee intelligibility. Inspired by filter flow, this study proposes a global-to-local compressive sensing image reconstruction model called G2LNet, which performs compressed sampling and initial reconstruction processes with convolutional layers using fast Fourier convolution and convolutional filter flow, taking into account the global contextual information of the image and local neighborhood information of the image pixel simultaneously. It learns to jointly optimize the measurement matrix and convolution filter flow and establishes a complete end-to-end trainable deep image reconstruction network. Verification experiments were performed on the Set5, Set11, and BSD68 test datasets commonly used in the field of compressive sensing image reconstruction at a 20% sampling rate. The image reconstruction quality of G2LNet was compared with that of the traditional algorithm MH and algorithm based on deep learning; the average peak signal-to-noise ratio of CSNet increased by 2.29 dB and 0.51 dB, respectively, effectively improving the quality of the reconstructed image.

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2135 (2023)
Weather recognition combining improved ConvNeXt models with knowledge distillation
Libo LIU, Siyu XI, and Zhen DENG

A weather recognition method combining an improved ConvNeXt network and knowledge distillation is proposed to improve the accuracy of weather recognition in complex traffic scenes while achieving network lightweighting. Firstly, the ConvNeXt_F network was constructed, and the SimAm attention mechanism was added after each set of Block feature extraction of the ConvNeXt network to correct the weights of the extracted deep features and strengthen the ability to capture discriminative weather features. Secondly, during the network training, equalized focal loss (EFL) and mutual-channel loss (MCL) were aggregated as the total loss function by using the average occupancy ratio, eliminating the effect caused by data imbalance using EFL and reducing the difference of local detail features under similar weather using MCL. Finally, the knowledge distillation technique was used to migrate the weather classification knowledge from the ConvNeXt_F network to the lightweight MobileNetV3 network, which has a marginal loss of accuracy but significant reduction in the number of network parameters. The experimental results showed that compared with other algorithms, the proposed method achieved 96.22% and 84.8% accuracy on the weather-traffic dataset of Ningxia expressway and publicly-available natural weather dataset RSCM2017, respectively; the FPSs were 157.6 Hz and 137.6 Hz and FLOPs and Params were 0.06 G and 2.54 M. Compared with the original network, the recognition accuracy, speed, and lightness of the network were improved, making it better applicable to practical scenarios with limited storage and computational power.

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2123 (2023)
Low-light image enhancement algorithm based on multi-channel fusion attention network
Qingjiang CHEN, and Yuan GU

Low-light images have low brightness, low contrast, and color distortion, and most existing enhancement algorithms do not deal with different channels differently, which is not conducive to the extraction of multi-level features. Therefore, this study proposes a low-light image enhancement algorithm based on a multi-channel fusion attention network. Firstly, we introduced octave convolution (OctConv) into the residual structure after channel splitting and propose a multi-level feature extraction module. Secondly, we proposed a cross-scale feature attention module using an attention mechanism and cross-residual structure. Thirdly, we obtained multi-level information by stacking modules with different sizes and channels. Finally, we performed feature fusion in the channel dimension and obtained the final output through the reconstruction module. The experimental results showed that compared with the RISSNet algorithm, the peak signal-to-noise ratio and structural similarity of real images were improved from 27.001 6 dB and 0.889 2 to 27.978 1 dB and 0.925 5, respectively. The proposed algorithm achieved the best results in four objective evaluation indicators: peak signal-to-noise ratio, structural similarity, mean squared error, and visual information fidelity. The algorithm can effectively improve the brightness and contrast of low-light images with well-maintained image textures and colors.

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2111 (2023)
REC-ResNet: Feature enhancement model for COVID-19 aided diagnosis
Tao ZHOU, Yuncan LIU, Senbao HOU, Xinyu YE, and Huiling LU

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2093 (2023)
Generate adversarial network for super-resolution reconstruction of remote sensing images by fusing edge enhancement and non-local modules
Jie LIU, Ruo QI, and Ke HAN

To address the serious noise pollution in the process of image remote sensing and the existence of object edge blur and artifacts in the super-resolution reconstructed image, this study proposes a remote sensing image super-resolution algorithm called edge-enhanced and non-local modules generative adversarial network (ENGAN). To make the image edge details clearer, the proposed algorithm integrated an image edge enhancement module. To further expand the receptive field of the model and enhance the edge noise removal, the Mask branch in the edge enhancement module was simultaneously improved. The use of the intrinsic feature correlation of images further improved the reconstruction performance of the network. In this study, comparison experiments of multiple algorithms were performed on two remote sensing image datasets, UCAS-AOD and NWPU VHR-10. The proposed method showed improvement in multiple evaluation indicators. Taking degradation type IV as an example, the 4x super-resolution SSIM was increased by 0.068, PSNR increased by 1.400 dB, and RMSE reduced by 12.5% compared with the deep-blind super-resolution degradation model. Moreover, the reconstructed remote sensing image can obtain better ground target detection results than the original image.

Optics and Precision Engineering
Jul. 25, 2023, Vol. 31 Issue 14 2080 (2023)
Semi-supervised instance object detection method based on SVD co-training
Rui WANG, Siyang FAN, Jingwen XU, and Zhiqing WEN

Detecting indoor instance objects is useful for various applications. Traditional deep-learning methods require a large number of labeled samples for network training, making them time-consuming and labor-intensive. To address this problem, SVD-RCNN—a semi-supervised instance object detection network based on singular value decomposition (SVD) and co-training—is proposed. First, key samples are selected for manual labeling to pre-train SVD-RCNN, to ensure that it acquires more prior knowledge. Second, a convergence, decomposition, and finetuning strategy based on SVD is used to obtain two detectors with strong independence in SVD-RCNN to satisfy the requirements of co-training. Finally, an adaptive self-labeling strategy is used to obtain high-quality self-labeling and detection results. The method was tested on multiple indoor instance datasets. On the GMU dataset, it achieved a mean average precision of 79.3% with 199 manually labeled samples. This was only 2% lower than that (81.3%) of Faster RCNN with fully supervised learning, which required labeling 3 851 samples. Ablation studies and a series of experiments confirmed the effectiveness and universality of the method. The results indicated that the method only needs to manually label 5% of the training data to achieve instance-level detection accuracy comparable to that of fully supervised learning; thus, it is suitable for applications in which intelligent robots must efficiently identify different instance objects.

Optics and Precision Engineering
Jul. 10, 2023, Vol. 31 Issue 13 2000 (2023)
Ground deformation analysis along island subway line by fusing time-series InSAR and LiDAR techniques
Ming GUO, Xingyu TANG, Yunming LIU, Changwei WANG, and Yujin WEI

The construction and operation of subways causes different degrees of impact on the ground, particularly the construction of subways on islands, which have complex topographic and geomorphological environments, unstable soil layers, and ground deformation that is difficult to analyze. The objective of this study was to solve these problems. In this study, using the 38-view image data of Sentinel-1 and tunnel point-cloud data collected independently, the permanent scatterer synthetic aperture radar measurement technology, differential interferometry short baseline set time-series analysis technology, and LiDAR technology were employed to study the area along Metro Line 2 within Xiamen Island. The accuracy of the results of these two techniques based on synthetic aperture radar were compared with level data to verify their accuracy. Finally, the results of deformation were compared with the results of the LiDAR technique to obtain point-cloud data by scanning the tunnel inside the subway. The experimental results show that the deformation rate based on PS-InSAR technology ranged from -24.21 to 24.19 mm/y, whereas the deformation rate based on SBAS-InSAR technology ranged from -22.86 to 33.79 mm/y; the two methods were essentially identical in the process of deformation monitoring. The comparison with the level measurement results indicate that there was an error between the two. The error was concentrated within ±7 mm and was approximately 13 mm for some of the distances from the level. Meanwhile, the settlement range of the underground tunnel was -43.4 to 104.1 mm, with a medium error of 28.44 mm, and the result differed significantly from those for the ground deformation. After several verifications, it was confirmed that the underground rail transit tunnel is less affected by the ground deformation.

Optics and Precision Engineering
Jul. 10, 2023, Vol. 31 Issue 13 1988 (2023)
Automatic segmentation of aggregate images with MET optimized by chaos SSA
Mengfei WANG, Weixing WANG, and Limin LI

Multiple entropy thresholding (MET) increases exponentially with an increase in the number of thresholds K. Related optimization strategies exhibit low accuracy and stability with the segmented aggregate images lacking considerable feature information such as surface roughness and edges. To overcome these problems, an automatic image segmentation model based on a chaotic sparrow search algorithm (SSA) was developed to optimize MET. SSA is a newer intelligent optimization algorithm. To enhance the global optimization capability and robustness of SSA, a logistic map is added to the uniform sparrow distribution at the time of population position initialization, an expansion parameter is applied to expand the global search, and temporal local stagnation is avoided by range-control elite mutation jumps. This algorithm is called logistic SSA (LSSA) and can improve the solution quality without reducing convergence speed. LSSA is used for the automatic selection of MET parameters, with the Renyi entropy, symmetric-cross entropy, and Kapur entropy as objective functions to quickly determine the correct thresholds. In this study, image segmentation and algorithm comparison experiments are conducted on aggregate images with different characteristics. The effectiveness of LSSA-MET was demonstrated by comparing six types of combined algorithms with the fuzzy C-means (FCM) algorithm. The proposed algorithm maintains a relatively high speed with an increase in K, taking 1.532 s to split an image on average even when K=6. Among the variousm entropies, LSSA-Renyi entropy performed the best, achieving 29.92%, 10.67%, and 5.16% accuracy improvements in peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM), respectively, thereby effectively retaining the aggregate surface texture and edge characteristics while achieving the optimum balance between precision and speed.

Optics and Precision Engineering
Jul. 10, 2023, Vol. 31 Issue 13 1973 (2023)
MMShip: medium resolution multispectral satellite imagery ship dataset
Li CHEN, Linhan LI, Shiyong WANG, Sili GAO, and Xiangzhou YE

Considering that the existing remote-sensing ship datasets consist entirely of cropped images, the detection effect of the detection algorithm trained on the datasets is poor when it is directly applied to satellite images of the original scale. In this study, a multispectral satellite ship dataset MMShip with four bands of visible and near-infrared (NIR) light was established. The dataset includes both the original-scale data of satellite images and cut small-scale ship data. Owing to the introduction of multi-band information, this dataset compensates for the shortcoming that most of the existing datasets contain visible images, which are easily affected by illumination conditions. Sentinel-2 satellite images with cloud cover of <3 in the oceans worldwide were downloaded. After atmospheric correction, only four bands—red, green, blue, and NIR—with a 10-m resolution were selected, and the images containing ships were screened by scene. Next, the screened images were divided into a size of 512 × 512 such that the divided images do not overlap, and the images that did not contain the ship target were eliminated. The LabelImage software was used to label the small-scale data with a horizontal frame, and then the labeled data were converted to the original scale to obtain the labeling information under the original scale. Finally, several typical detection algorithms were used to perform visible-light, near-infrared, and multispectral comparison experiments on the altered MMShip small-scale dataset. In this study, a multispectral satellite ship target dataset covering different scenes was constructed, which included 497 original scale-labeled data and 5 016 groups of cropped ship target images. The contrast experiment confirmed that the addition of near-infrared band information can increase the accuracy of the ship target detection algorithm. The developed multispectral ship dataset MMShip can be applied to research on algorithms for multispectral ship target detection at the satellite-image and ordinary-image scales.

Optics and Precision Engineering
Jul. 10, 2023, Vol. 31 Issue 13 1962 (2023)
Cross-scene hyperspectral image classification combined spatial-spectral domain adaptation with XGBoost
Aili WANG, Shanshan DING, He LIU, Haibin WU, and Yuji IWAHORI

For solving the problem of spectral shift between the source domain and target domain in cross-scene hyperspectral remote sensing image classification, this study proposes a cross-scene hyperspectral image classification model combining spatial-spectral domain adaptation and Xtreme Gradient Boosting (XGBoost). First, the Depth Over Parametric Convolution Model (DOCM) and Large Kernel Attention (LKA) was combined to form a spatial-spectral attention model and extract the spatial-spectral features of the source domain. Next, the same spatialspectral attention model was used to extract features from the target domain, and the discriminator was used to adapt to the confrontation domain to reduce the spectral shift between the source and target domains. Second, the feature extractor of the target domain was adapted to the supervised domain through a small amount of labeled data in the target domain such that the feature extractor of the target domain can learn the true distribution of the target domain and map the features of the source and target domains to form a similar spatial distribution and complete the clustering domain adaptation. Finally, the ensemble classifier XGBoost was used to classify hyperspectral images to further improve the training speed and confidence of the model. Experimental results for the Pavia and Indiana hyperspectral datasets indicate that the overall classification accuracy of this algorithm reaches 91.62% and 65.98%, respectively. Compared with other cross-scene hyperspectral image classification models, the proposed model has a higher classification accuracy.

Optics and Precision Engineering
Jul. 10, 2023, Vol. 31 Issue 13 1950 (2023)
Few-shot object detection on Thangka via multi-scale context information
Wenjin HU, Huiyuan TANG, Chaoyang YUE, and Huafei SONG

Classifying and locating objects of interest in Thangka images can help people understand the rich semantic information of Thangka and promote cultural inheritance. To address the problems of insufficient Thangka image samples, the complex background, the occlusion of detection targets, and the low detection accuracy, this paper proposes a few-shot object detection algorithm for Thangka images that combines multi-scale context information and dual attention guidance. First, a new multi-scale feature pyramid is constructed to learn the multi-level features and contextual information of Thangka images and improve the ability of the model to discriminate multi-scale targets. Second, a dual attention guidance module is added at the end of the feature pyramid to improve the ability of the model to represent key features while reducing the impact of noise. Finally, Rank&Sort Loss is used to replace the cross-entropy classification loss, which simplifies the model training process and increases the detection accuracy. Experimental results indicate that the proposed method achieved a mean average precision of 19.7% and 11.2% in 10-shot experiments using a Thangka dataset and the COCO dataset, respectively.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1859 (2023)
Automatic threshold selection method using exponential Renyi entropy under multi-scale product in stationary wavelet domain
Yaobin ZOU, Xiangdan MENG, Shuifa SUN, and Peng CHEN

When a gray-level image is affected by different factors, such as the size ratio of the target to the background, noise, or random details, its gray-level histogram exhibits peakless, unimodal, bimodal, or multimodal patterns. To deal with the issue of automatic threshold selection in these four situations within a unified framework, an automatic threshold selection method using the exponential Rényi entropy under the multi-scale product in the stationary wavelet domain is proposed. First, stationary wavelet multi-scale transformation is applied to the original gray-level image in the horizontal, vertical, and diagonal directions, and a fused image is constructed via the multi-scale multiplication of high-frequency sub-bands in each direction. Then, the fused image is sampled by the inner and outer contour image to construct a one-dimensional gray-level histogram. Finally, the exponential Rényi entropy corresponding to the constructed histogram is calculated, and the threshold corresponding to the maximum exponential Rényi entropy is taken as the final threshold. The proposed method was compared with four automatic threshold segmentation methods, two clustering segmentation methods, and two active contour segmentation methods. The experimental results for 16 synthetic images and 50 real-world images indicated that with regard to the segmentation accuracy, the proposed method outperformed the second-best method by 41.2% and 20.8% in terms of the average Matthews correlation coefficient for the synthetic and real-world images, respectively. Although the proposed method has no advantage with regard to computational efficiency, it has more robust segmentation adaptability and a higher segmentation accuracy than the other eight segmentation methods.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1841 (2023)
Image dehazing based on polarization optimization and atmosphere light correction
Jing WU, Wenjie SONG, Cuixia GUO, Xiaojing YE, and Feng HUANG

To improve the recovery ability of polarization dehazing algorithms in fog scenes, a polarization image dehazing algorithm based on polarization optimization and atmospheric light correction is proposed. First, according to the brightness distribution of the fog scene, the fog image was decomposed into bright residuals and dark residuals via guided filtering. Second, to optimize the degree of polarization, the degrees of polarization corresponding to the bright and dark residuals were increased and decreased, respectively. This optimized degree of polarization can blur the atmospheric light image. The difference value of the degree of polarization in the residuals was used to correct the atmospheric light for ensuring its intensity range met the atmospheric degradation model. Experiments indicated that the contrast ratio was 3.07 times that in original hazy images after dehazing and that the entropy and standard deviation of dehazed images were increased by 9.21% and 61.86%, respectively. In environments with different concentrations of simulated fog, the proposed algorithm achieved excellent SSIM, CIEDE2000, and PSNR values. Compared with the state-of-art dehazing algorithms, the effect of the proposed algorithm was obvious, and it recovered the scene details efficiently.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1827 (2023)
Railway few-shot intruding objects detection method with metric meta learning
Baoqing GUO, and Defen ZHANG

Object intrusion is among the primary causes of railway accidents. Typically, traditional deep-learning methods require numerous samples for network training; however, intrusion samples in railway settings are scarce and difficult to obtain. Thus, in this paper, a railway few-shot intruding-object detection method based on an improved metric meta-learning network is proposed. To better exploit the features of intruding objects during classification, a feature-extraction network based on the channel attention mechanism is proposed. A network based on fine-tuning of the class center is proposed for class-center correction to solve the problem of individual samples deviating in the feature space of insufficient samples. Additionally, a central correlation loss function based on the center loss and cross entropy is constructed for few-shot network training to improve the compactness of the same-class feature distribution in the feature space. In experiments on a public few-shot dataset called miniImageNet, the accuracy of the proposed method is 7.31% higher than the optimal accuracy of the classical few-shot learning model. In five-way five-shot ablation experiments using a railway dataset, the proposed channel attention mechanism and center-related loss function increase the mean average precision (mAP) by 0.86% and 1.91%, respectively. Additionally, the center fine-tuning and pretraining increase the mAP by 3.05% and 6.70%, respectively, and the total mAP improvement is 7.90%.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1816 (2023)
Multi-scale YOLOv5 for solar cell defect detection
Yafang CHEN, Fei LIAO, Xinyu HUANY, Jing YANG, and Hengxiang GONG

Herein, to realize high-precision crack and break defect detection in solar cells under electroluminescent (EL) conditions, the multi-scale You Only Look Once version 5(YOLOv5) model is used for solar-cell defect detection under real industrial conditions. First, an improved feature-extraction network combining deformable convolution version 2 (DCNv2) and coordinate attention (CA) is proposed to widen the receptive field of small target defects and enhance the extraction of small-scale defect features. Second, an improved path aggregation network (PANet), called CA-PANet, is proposed for integrating the CA and cross-layer cascade in a path aggregation network to multiplex shallow features. Notably, the CA-PANet combines deep and shallow features to enhance the feature fusion of defects at different scales, improve the feature representation of defects, and increase the defect detection accuracy. The low computational cost of the lightweight CA ensures the real-time performance of the model. Experimental results indicate that the mean average precision(mAP) of the YOLOv5 model combining DCNv2 and CA can reach 95.4%, which is 3% higher than that of the YOLOv5 model and 1.4% higher than that of the YOLOX model. The improved YOLOv5 model can achieve a frame rate of up to 51 frames per second(FPS), meeting industrial real-time requirements. Compared with other algorithms, the improved YOLOv5 model can accurately detect cracks and break defects in EL solar cells, satisfying the demand for real-time, high-precision defect detection under industrial conditions in photovoltaic power plants.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1804 (2023)
Binocular vision measurement method incorporating one-dimensional probabilistic Hough transform and local Zernike moment
Hao ZHANG, Sixiang XU, Chenchen DONG, and Shuhua ZHOU

To address the low measurement accuracy resulting from the inability to detect ideal corners of an object, a binocular-vision-based measurement method incorporating the one-dimensional probabilistic Hough transform and local Zernike moment is proposed herein. First, the one-dimensional probabilistic Hough transform is used for line detection of the outer contour. Next, sub-pixel extraction is performed using the Zernike moment method in the region of interest (ROI) established according to the line detection, and sub-pixel points are screened in the intersection region of the ROI. Then, before matching the key points, sub-pixel edge lines are fitted using the orthogonal total least squares method. Finally, the three-dimensional spatial information of a continuous casting slab model is obtained via the triangulation principle, and the measurement is completed. Here, the continuous casting slab model is considered as the measurement object. Experimental results indicate that the minimum relative error of the proposed algorithm is 0.340 1%, which satisfies the measurement requirement. The average relative error in the length is 0.3945%, which is 80.01% and 74.63% smaller than those of the traditional SIFT and ORB algorithms, respectively. Compared with another method based on edge fitting, the measurement error and time consumption of the proposed algorithm are reduced by 34.11% and 39.07%, respectively, confirming its measurement accuracy and efficiency.

Optics and Precision Engineering
Jun. 25, 2023, Vol. 31 Issue 12 1793 (2023)
Modeling development method for slave controller of real-time ethernet fieldbus
Lingyu CHEN, Jieji ZHENG, Aihua HE, and Dapeng FAN

The slave controller is a fundamental component for realizing bus communication, field sensor acquisition, motor, and other actuator control functions in the industrial automation control system. To meet the needs of various industrial field actuators and sensors and to solve the problems of the cumbersome hardware and software design, low development efficiency, and difficult upgrading and transplantation of slave controllers, this paper proposes a model development method for slave controllers that involves analyzing the hardware and software components of typical slave controllers. The common real-time Ethernet communication processing model of slave controllers and the device control models of digital input/output (I/O), analog I/O, motion control, and communication interface conversion slaves in practical applications are established, and the model code is automatically generated and ported to realize the standardized and rapid development of slave controller software on the domestic hardware platform. The experimental results verify the effectiveness of the proposed model development method. This study provides a new method for the rapid development and upgrading of the localized slave controller software.

Optics and Precision Engineering
Jun. 10, 2023, Vol. 31 Issue 11 1710 (2023)
Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer
Yunzuo ZHANG, Wei GUO, and Cunyu WU

The efficient extraction of buildings from remote sensing images plays an important role in urban planning, disaster rescue, and military reconnaissance. Building extraction methods based on deep learning have made significant progress in accuracy, especially with the sparse token transformer network (STTNet) achieving extremely high accuracy. However, these methods are usually implemented using complex convolution operations in extremely large network models, which results in low extraction speed, thereby presenting difficulties in fulfilling practical needs. Therefore, in this study, a method is designed for the fast extraction of buildings from remote sensing images. First, multi-scale convolution is introduced into the feature extraction network of the STTNet model, whereby multi-scale features are extracted in the same convolution layer to further improve the feature extraction capability of the model. Second, channel attention is applied to the feature map of the force weights, to effectively learn channel attention weights, thereby solving the problem of floating channel attention weights when using the backbone network to output the learned feature map. Finally, to reduce the number of model parameters and speed up the model, the STTNet model structure is changed from parallel to series. Experiments on the INRIA building dataset show that in terms of accuracy and the intersection over union (IoU) metric, the proposed method is 18.3% faster than STTNet and thus better than current mainstream methods.

Optics and Precision Engineering
Jun. 10, 2023, Vol. 31 Issue 11 1700 (2023)
Image correction for perspective projection distortion of cylindrical surface
Xiaohua XIA, Yuqiang LI, Yiqing ZOU, Yongbiao HU, and Lijun JIANG

To address the inaccurate measurement of cylindrical surface defects caused by perspective projection in machine vision systems, an image correction method to resolve cylindrical surface perspective projection distortion is proposed. In this method, the image area of the cylinder is first extracted. Then, the transverse and axial directions are determined. On the basis of the perspective projection characteristics of the cylindrical surface, the distortion is divided into axial and transverse deformation. The imaging parameters and cylindrical radius are utilized to establish the corresponding relationship between the coordinates of the original image and the those of the corrected image. The perspective projection distortion is corrected by pixel mapping and nearest neighbor interpolation. The experimental results demonstrate that the proposed method affords a good correction effect on images of different-diameter cylindrical surfaces. Both near-large and far-small perspective deformation and oblique projection deformation are eliminated in the corrected images. For the checkerboard simulation correction, the measurement error for the side length of the six cylindrical checkerboard squares decreases from a highest of 14.9% before correction to 1.2% after correction. In a scratch measurement, the maximum errors in the original image of two cylinders of different diameters are 78.0% and 61.8%. After correction using the proposed method, the maximum errors are only 5.9% and 5.5%, respectively. The remarkable correction effect verifies the effectiveness of the method.

Optics and Precision Engineering
Jun. 10, 2023, Vol. 31 Issue 11 1691 (2023)
Small sample data augmentation and abundances inversion of minerals hyperspectral
Ling ZHU, Ming LI, and Kai QIN

Using deep learning methods to retrieve mineral abundance requires numerous labeled hyperspectral data samples. Thus, a method based on the Hapke mixed model with filling factor is proposed for data augmentation of small mineral samples, to generate a large number of labeled datasets. First, five kinds of common mineral powders were mixed by multiple elements in the laboratory according to the weight mixing ratio, and the spectra of mixed minerals were measured. Subsequently, mixing spectra were simulated considering the corresponding weight proportion of the five mixing models, including the linear mixing model. The simulated spectra of the augmented data using the original Hapke and the Hapke mixing model with filling factors of 0.1, 0.2, and 0.3 were compared with the measured spectra. Finally, based on the sum to one abundance matrix randomly generated by the Monte Carlo method, forty thousand simulated spectra were generated using the five mixing models. The abundance information on real spectral data was obtained by treating the simulated spectra as the training dataset of the stack autoencoder network. The results showed that the simulation results obtained using the original Hapke model and the model with filling factors were better in accuracy than those of the linear mixed model. When the filling factors of the Hapke model were set to 0.1 and 0.2, the mean SAM error was 0.053 5 and 0.053 7, respectively, and the RMSE error of mineral abundance inversion of hyperspectral data was 0.124 8, demonstrating the superiority of the Hapke model with filling factors over the other four methods. The simulated mineral spectrum was closer to the measured spectrum and better than that without any filling factors with a simulation error of 0.074 8, and the spectra associated with the simulated data were closer to the real spectrum, thereby providing support for mineral abundance inversion research based on deep learning.

Optics and Precision Engineering
Jun. 10, 2023, Vol. 31 Issue 11 1684 (2023)
Measurement uncertainty evaluation and analysis for industrial computed tomography based on forest balls
Zenan YANG, Ziying HUANG, Haiyong ZHA, Liyuan ZHOU, and Kuidong HUANG

Non-metallic forest balls are often used to evaluate the measurement uncertainty of industrial computed tomography (CT) detection involved the material and scale of parts actually are large, which has introduced the problem of insufficient applicability and reliability. In this study, the uncertainty of industrial CT measurement was evaluated according to forest balls made of different materials, and the effects of the materials on the uncertainty were examined. First, we designed and fabricated three types of standard forest balls with different materials and calibrated them using a CMM according to the commonly used measurement range of industrial CT. Then, we performed evaluation and analysis of the measurement uncertainty for the diameter and center distance from forest balls based on industrial CT scanning and measuring. The results indicated that the influence of the material on the expanded uncertainty of diameter measurement from the forest balls was insignificant. The expanded uncertainty of measurement from the balls increased with the ball center distance, and the expanded uncertainty of non-metallic balls, including a ceramic ball and ruby ball, was essentially the same: approximately 0.003 5 mm. In comparison, that of a steel ball was 3.4 times larger, reaching 0.012 2 mm. There exists a certain system error in ball center distance measurement, however, the material influence on the standard forest balls is not evident. Center distance measurement based on forest balls of different materials has engineering value for evaluating the selection and calibration of industrial CT measurement uncertainty.

Optics and Precision Engineering
Jun. 10, 2023, Vol. 31 Issue 11 1672 (2023)
Textile defect recognition network based on label embedding
Ying LIU, Wei JIANG, Guandian LI, Lei CHEN, and Shuang ZHAO

A convolutional neural network (CNN) can be used in the industrial production environment to identify and classify textile defects. To overcome the problems in the visual discrimination of small defect types and imbalance of textile defect categories in actual scenes, a textile defect recognition network (TDRNet) based on label embedding method is proposed. First, the backbone structure is adjusted to improve the classification accuracy of the model. Then, a label embedded module (LEM) is constructed to generate the category weight offset of the model. Subsequently, a distribution perception loss function (DP loss) is proposed to adjust the class distribution of the algorithm; this reduces the distance of homogenous defect features and increases the distance of heterogeneous features. Finally, the seesaw loss function is introduced to dynamically balance the gradient update for different samples during the model training process by suppressing the negative sample gradient of a few categories and increasing the sample loss during misclassification, thereby alleviating the misclassification rate of a few categories. In the self-made "Guangdong intelligent manufacturing" cloth defect classification dataset, the top1 error rate of our framework for rough-grained and fine-grained classifications reached 16.35% and 17.12%, respectively, whereas the top5 error rate of fine-grained classification was as low as 5.20%. Compared with other classification models, TDRNet achieved the best results. In addition, TDRNet was compared with the classical fine-grained classification model in recent five years and achieved state-of-the-art (SOTA) performance, fully demonstrating the enhancements provided.

Optics and Precision Engineering
May. 25, 2023, Vol. 31 Issue 10 1563 (2023)
Infrared and visible image fusion based on fast alternating guided filtering and CNN
Yanchun YANG, Yongping LI, Jianwu DANG, and Yangping WANG

In order to solve the problems of the loss of detail information, blurred edges, and artifacts in infrared and visible image fusion, this paper proposes a fast alternating guided filter, which significantly increases the operation efficiency while ensuring the quality of the fused image. The proposed filer combines a convolutional neural network (CNN) and infrared feature extraction effective fusion. First, quadtree decomposition and Bessel interpolation are used to extract the infrared brightness features of the source images, and the initial fusion image is obtained by combining the visible image. Second, the information of the base layer and the detail layer of the source images is obtained through fast alternating guided filtering. The base layer obtains the fused base image through the CNN and Laplace transform, and the detail layer obtains the fused detail image through the saliency measurement method. Finally, the initial fusion map, basic fusion map, and detail fusion map are added to obtain the final fusion result. Because of the fast alternating guided filtering and feature extraction performance of this algorithm, the final fusion result contains rich texture details and clear edges. The experimental results indicate that the fusion results obtained by the algorithm have good fidelity in vision, and its objective evaluation indicators are compared with those of other methods. The information entropy, standard deviation, spatial frequency, wavelet feature mutual information, visual fidelity, and average gradient show improvements by 9.9%, 6.8%, 43.6%, 11.3%, 32.3%, and 47.1%, respectively, on average.

Optics and Precision Engineering
May. 25, 2023, Vol. 31 Issue 10 1548 (2023)
Dense pedestrian detection algorithm in multi-branch non-anchor frame network
Zhixuan LÜ, Xia WEI, and Deqi HUANG

Considering the problem of missed pedestrian detection in dense pedestrian images, a multi-branch non-anchor frame network (MBAN) detection method is proposed to detect various posture changes and serious human occlusion in multi-person traffic scenes, such as streets. First, a multi-branch network structure is added after model backbone network detection to detect the local features of multiple key areas with pedestrians. Subsequently, the distance loss function between key areas is designed to guide the branch network to differentially learn the local detection position of pedestrians. Thereafter, four up-sampling blocks are added to the tail of the ResNet50 network to form an hourglass structure, thereby improving the branch network’s ability to understand the spatial information of local features of pedestrians. Finally, a local feature selection network is designed to adaptively suppress the non-optimal values of the multi-branch output and eliminate the redundant feature box in prediction. In the experimental results, the mAP, F1, Prec, and Recall values of the MBAN method for pedestrian detection in multi-person scenes reached 85.22%, 0.87, 80.07%, and 94.39%, respectively. Therefore, this method is effective in detecting pedestrians in dense crowds and has higher recall rate compared with other pedestrian detection algorithms.

Optics and Precision Engineering
May. 25, 2023, Vol. 31 Issue 10 1532 (2023)
Matching method of cultural relics fragments based on multi-feature parameters fusion
Fuqun ZHAO, and Mingquan ZHOU

To address the low accuracy of the single-geometric-feature-based matching method for cultural relic fragments, an alternative multi-feature-parameter-fusion-based automatic matching method is proposed herein. For this, first, a segmentation algorithm is used to extract the fracture surfaces of cultural relic fragments, and the following four characteristic parameters of points on these fracture surfaces are computed: the average distance from a point to its neighborhood points, distance from a point to the gravity center of its neighborhood, curvature, and average value of the normal included angle of the neighborhood. Following this, the four feature parameters are fused to obtain feature discrimination parameters, and a feature point set is extracted by judging the value of these feature discrimination parameters. Finally, the iterative closest point algorithm based on the scale factor is used to match the feature point set, and consequently, accurate fracture surface matching of cultural relic fragments is achieved. In the experiment, a point cloud data model of Terracotta Warriors fragments is used to verify the performance of the multi-feature-parameter-fusion-based matching method for cultural relic fragments. The results reveal that the proposed matching method can overcome the low accuracy of the single-geometric-feature-based matching method. Compared with the matching accuracy of the existing algorithm, that of the proposed algorithm is improved by more than 15%, while its time efficiency is improved by more than 20%. Therefore, the multi-feature-parameter-fusion-based matching method is effective for cultural relic fragment matching.

Optics and Precision Engineering
May. 25, 2023, Vol. 31 Issue 10 1522 (2023)
Full reference image quality assessment based on color appearance-based phase consistency
Benchi JIANG, Shilei BIAN, Chenyang SHI, and Lulu WU

To improve the accuracy of image quality assessment, a full-reference image quality assessment (IQA) model is proposed based on the phase consistency of color appearance scale. First, the image structure information is extracted from vividness, which is an index of color appearance in the CIELAB color space, to obtain a color appearance-based phase consistency value. Subsequently, the contrast similarity map is calculated using the root-mean-square method to obtain the chroma similarity map through the color channel of the color space. Finally, the three image features of phase consistency, contrast, and chromaticity are combined, and the standard deviation method is used for pooling. Consequently, the full-reference IQA computing model is realized. To verify the reliability of this model, experiments were conducted on distorted images in four common image databases, whereby prediction accuracy, computational complexity, and generalization were determined based on four criteria. In the experimental results, the Pearson linear correlation coefficient of this model was the lowest for TID2013 at 0.8781 and highest for LIVE at 0.9616. The Spearman rank correlation coefficient was the lowest for TID2013 at 0.8592 and highest for LIVE at 0.9653. Compared with many existing methods, the proposed IQA model has higher prediction accuracy for visual relationships.

Optics and Precision Engineering
May. 25, 2023, Vol. 31 Issue 10 1509 (2023)
3D vehicle detection for unmanned driving systerm based on lidar
Xiru WU, and Qiwei XUE

This paper proposes a 3D vehicle detection algorithm for unmanned driving systems to solve the problem of low accuracy in environmental perception based on lidar. First, according to statistical filtering and a random sampling consensus algorithm (RANSAC), the ground point cloud segmentation was analyzed in order to eliminate the redundant points and outliers of the lidar data. Second, we improved the 3DSSD deep neural network to extract vehicle semantic and distance information from the point cloud through fusion sampling. According to the feature information, the candidate point position was adjusted twice to generate a center point. The 3D center-ness assignment strategy was adopted to create a 3D vehicle detection box. Finally, we divided the KITTI dataset into different scenes, to be used as experimental data, by comparing various current 3D vehicle detection algorithms. The experimental results showed that the proposed method could detect vehicles quickly and accurately. The average detection time was 0.12 s, and the highest detection accuracy was 89.72%.

Optics and Precision Engineering
Feb. 25, 2022, Vol. 30 Issue 4 489 (2022)
Two-step calibration for vision measurement system with large field of view and high depth
Hao HU, Bin WEI, Jin LIANG, Huigang WANG, and Yongqing ZHANG

To address the problems of low calibration accuracy, difficulties in large target fabrication, and complicated operation in engineering fields, a two-step camera calibration method for vision measurement system with a large field of view is herein proposed and implemented based on industrial close range photogrammetry. The mathematical model of the perspective model of camera imaging is investigated. First, in the close range, the calibration of the internal parameters at the front section is realized using a small-scale cross target and the pyramid method. Second, in the far range, several coding mark points are arranged in the measured space, and the external parameters are calculated based on the principle of single image intersection. Finally, all the internal and external parameters are optimized using bundle adjustment. To verify the feasibility and accuracy of the proposed method, a vision experiment with a large field of view is carried out. Experimental results show that the re-projection error is less than 0.08 pixels, the maximum absolute error of the three-dimensional measurement is 0.43 mm, and the pitch angle error of the rotor with a diameter of 10 m is less than 0.1°. It is therefore proved that the method can be used to achieve internal as well as external parameter calibration in the external field separately.

Optics and Precision Engineering
Feb. 25, 2022, Vol. 30 Issue 4 478 (2022)
Single image dehazing with sky segmentation and haze density estimation
Jianwei LV, Feng QIAN, Haonan HAN, and Bao ZHANG

To solve the unnatural restoration of the sky area and imprecise estimation of haze density, a dehazing algorithm for sky segmentation and haze density estimation is proposed. First, to improve the precision of transmission estimation and the quality of image dehazing, the thresholds of gradient and brightness are used to segment the sky region. Next, an adaptive dark channel prior and quadratic tree subdivision method are utilized to estimate the atmospheric light. Finally, different transmission estimation methods are used for the sky and non-sky regions; a bright channel prior is used in the sky region, and a linear haze density estimation model is proposed in the non-sky region. The final transmission is obtained by combining the probability distribution of the pixel and edge refinement using guided filter, and the recovered image is attained using the atmospheric scattering model. Experimental results show that the dehazed images perform well in terms of subjective and objective quality evaluation. The proposed dehazing algorithm can restore a more natural sky and dehaze more thoroughly to improve the clarity of image details. The operating speed of the proposed algorithm is similar to that of the current algorithms. Furthermore, the proposed algorithm is more stable compared to traditional algorithms for different hazy scenes.

Optics and Precision Engineering
Feb. 25, 2022, Vol. 30 Issue 4 464 (2022)
Research on attitude measurement technology of free flying model under window distortion correction
Lei CHEN, Yang XU, Xiaobin XU, Tao ZHU, Xiaoyu MA, Fei XIE, and Chao HE

During hypersonic wind tunnel tests, the pose information of a model supports the reliability of the experimental data, and the measurement accuracy of these data has a significant impact on the test results. Binocular vision measurement systems can measure the attitude parameters of free flight test models in a hypersonic wind tunnel. The system is usually placed outside of the wind tunnel test section and observes the model through a glass window situated on a wall of the test section. However, the glass window distorts the image and reduces the measurement accuracy. This paper proposes a measurement method to correct the glass window distortion. By modeling the image surface distortion caused by refraction through the glass window, a correction method is proposed based on the linear fitting of marker points to eliminate the image distortion and improve the measurement accuracy. Moreover, a binocular vision measurement system is developed at the scene of the Φ 1 m hypersonic wind tunnel test, and a six degrees of freedom attitude of a free flight model is successfully measured. The results show that the measuring accuracy was greater than 0.5 mm when the measuring range was 1 m × 1 m × 1 m. Therefore, the system satisfies the requirements of subsequent pneumatic data analysis.

Optics and Precision Engineering
Feb. 25, 2022, Vol. 30 Issue 4 455 (2022)
Adaptive Canny operator edge detection under strong noise
Yuhan LIU, He YAN, Zaozao CHEN, Xiaotang WANG, and Junbin HUANG

The traditional Canny operator cannot effectively filter out the salt and pepper noise generated during the decoding process and transmission of an image, and cannot retain the edge details. To overcome this, an improved Canny operator image edge detection algorithm for operation under strong noise was proposed. According to the extreme value and gray difference of salt and pepper noise, the pixel points were divided into noise points and suspected noise points. The size and weight of the filter window were adaptively changed according to the pixel points after classification, which could reduce the influence of noise while retaining the image details. Then, the Sobel operators for eight directional templates were introduced to calculate the gradient amplitude to improve the edge positioning effect after filtering. Finally, iterative adaptive threshold algorithm and Otsu algorithm were used to select the best threshold to achieve adaptive threshold setting and improve the edge connection effect. The results of the comparative experiment show that after denoising the noisy image, the structural similarity is 0.949, the peak signal-to-noise ratio is increased by 10.97 dB compared with the traditional algorithm, the average edge evaluation is increased by 27.2%, and the F1 value is increased by 34.6%. The proposed algorithm retains the excellent performance of the Canny operator, can effectively remove salt and pepper noise, and has better edge detail protection capabilities.

Optics and Precision Engineering
Feb. 15, 2022, Vol. 30 Issue 3 350 (2022)
Multi-stage boundary reference network for action segmentation
Lin MAO, Zhe CAO, Dawei YANG, and Rubo ZHANG

Over-segmentation leads to incorrect predictions and reduces segmentation quality in existing action segmentation algorithms. To address this, the reference from video action boundary information was independently introduced for each stage in the backbone, which was based on a multi-stage temporal convolutional network. To avoid the model solidification caused by the application of the same boundary information at all stages, a weight adjusting block composed of multilayer parallel convolution was proposed to adjust the boundary values involved in the output calculation of each stage and process various samples differently. The reference from the adjustable boundary information was used to smoothen the output of each stage according to the time sequence, significantly reducing the over-segmentation error. Experimental results show that the proposed method outperforms existing methods in the three video action segmentation datasets GTEA, 50Salads and Breakfast. Compared with the boundary-aware cascade networks(BCN) algorithm, the segmentation edit score is increased by 1.7% on average, and the reconciliation score between accuracy and recall rate is increased by 1.5% on average.

Optics and Precision Engineering
Feb. 15, 2022, Vol. 30 Issue 3 340 (2022)
Detection of leaky cable fixture in high-speed railway tunnel with layered continuous gradient fusion feature
Yunzuo ZHANG, Zhouchen SONG, Wei GUO, and Xu DONG

Deep mining algorithms and multi-feature fusion algorithms based on local binary patterns are effective methods for extracting the fixture features of leaky cables in railway tunnels; however, there are disadvantages that the descriptors are not expressive enough and that their feature dimensions are too high. In this paper, layered continuous gradient local binary pattern (LCG-LBP) was proposed, which could realize the scale transformation of leaky cable fixture features. It could reduce the feature dimension of the fusion descriptor extracted from down-sampling feature maps. It could also improve the classification accuracy of faulty fixture images effectively. First, the improved algorithm based on center-symmetric local binary pattern (CS-LBP) and the adaptive threshold obtained by the global gray average value were used to calculate the gradient direction feature in a circle domain unit, and the complete preliminary gradient direction feature map was obtained in this way. Then, two consecutive down-sampling iterations were performed on this preliminary feature map to obtain two down-sampling feature maps, and the continuous gradient features were extracted from these two down-sampling feature maps. Finally, the two layers of continuous gradient features in different scales were connected in series as a fusion descriptor, and a support vector machine (SVM) was used to complete the defect detection process using faulty cable fixture images obtained from railway tunnels. The experimental results show that the recall and accuracy of the algorithm proposed in this paper are 0.923 and 0.857, respectively, which show that the proposed algorithm has obvious advantages compared with local binary pattern (LBP), CS-LBP, and other variants.

Optics and Precision Engineering
Feb. 15, 2022, Vol. 30 Issue 3 331 (2022)
Infrared and visible image fusion based on WEMD and generative adversarial network reconstruction
Yanchun YANG, Xiaoyu GAO, Jianwu DANG, and Yangping WANG

To overcome the problem of blurred edges and low contrast in the fusion of infrared and visible images, a two-dimensional window empirical mode decomposition (WEMD) and infrared and visible light image fusion algorithm for GAN reconstruction was proposed. The infrared and visible light images were decomposed using WEMD to obtain the intrinsic mode function components (IMF) and residual components. The IMF components were fused through principal component analysis, and the residual components were fused by the weighted average. The preliminary fused image was reconstructed and input into the GAN to play against the visible light image, and some background information was supplemented to obtain the final fusion image. The average gradient (AG), edge strength (EI), entropy (EN), structural similarity (SSIM), and mutual information (MI) are used for objective evaluation, and they increased by 46.13%, 39.40%, 19.91%, 3.72%, and 33.10%, respectively, compared with the other five methods. The experimental results show that the proposed algorithm achieves better retention of the edge and texture details of the sources image while simultaneously highlighting the target of the infrared image, has better visibility, and has obvious advantages in terms of objective evaluation indicators.

Optics and Precision Engineering
Feb. 15, 2022, Vol. 30 Issue 3 320 (2022)
Deep convolutional generative adversarial network algorithm based on improved fisher's criterion
Hao ZHANG, Guanglei QI, Xiaogang HOU, and Kaimei ZHENG

An improved Fisher’s criterion-based deep convolutional generative adversarial network algorithm (FDCGAN) is proposed in this study to solve the problem of quality deterioration in generated images when the training sample size is insufficient or number of iterations decreases. In this method, a linear layer is added to the discriminative model to extract category information. Then, Fisher’s criterion is used in backpropagation to combine label and category information. To minimize errors, the weights are adjusted iteratively while maintaining small intra-class and large inter-class distances such that the weights can rapidly approach the optimal value. A comparison of the experimental results of the FDCGAN model with that of the most recent six network models shows that the proposed model achieves better performance in all the FID metrics. In addition, applying the proposed model to the current advanced models in generalization tests yields more satisfactory results.

Optics and Precision Engineering
Dec. 25, 2022, Vol. 30 Issue 24 3239 (2022)
Fusion of infrared and visible images via structure and texture-aware retinex
Jianping HU, Mengyun HAO, Ying DU, and Qi XIE

To improve the quality of the fusion of infrared and visible images, this study proposes a novel method based on structure and texture-aware Retinex (STAR). It first decomposes the source images into reflection and illumination components according to the STAR model. This decomposition can separate the texture and structure of the source images accurately and extract the detailed features of the visible images with low luminance. Subsequently, it merges the reflection component using a weight map, which is constructed using the second-order gradient of the source images as the input. Moreover, it merges the illumination component using a gamma function, which can make the fused image have more brightness information. Finally, it reconstructs the fused reflection and illumination components to obtain the final fusion image. According to the test on 38 pairs of widely used images in the TNO infrared and visible image database, the proposed method can generate excellent fused results with high visual quality. Furthermore, compared with five state-of-the-art methods for the fusion of infrared and visible images, the proposed method achieved significantly better objective evaluation results in mutual information, nonlinear correlation information entropy, and feature measurement based on image phase consistency. This study involves the use of STAR model for fusing infrared and visible images and establishes a direct fusion framework based on Retinex, which improves the fusion results of the existing methods in terms of detailed features and global contrast.

Optics and Precision Engineering
Dec. 25, 2022, Vol. 30 Issue 24 3225 (2022)
Robust point cloud registration of terra-cotta warriors based on dynamic graph attention mechanism
Linqi HAI, Guohua GENG, Xing YANG, Kang LI, and Haibo ZHANG

The current point cloud registration methods cannot effectively address resolution mismatches, partial overlaps of point clouds, and numerous noise points when used for cultural relic models such as Terra-cotta Warriors. Hence, a ResUNet registration model based on the dynamic graph attention mechanism is proposed. The model integrates the residual module into the U-Net, performs three-dimensional (3D) sparse voxel convolution to calculate the features of point clouds, and applies a new normalization technology known as batch-neighborhood normalization to improve the robustness of features against point density changes. To improve the registration performance, the model aggregates local and context features via self- and cross-attention mechanisms. Finally, a random sampling consensus algorithm is used to estimate the change matrix between the source and target point clouds to complete the robust registration of the Terra-cotta Warriors model. To verify the effectiveness and robustness of the proposed method, four datasets (3DMatch, 3DLoMatch, 3DMatch with resolution mismatches, and two sets of terra-cotta warrior data) were used to test the registration model. Experimental results show that the registration recall was 90.1% and 61.0% in the 3DMatch and 3DLoMatch datasets, respectively. In the mismatched-resolution 3DMatch dataset, compared with feature learning-based registration algorithms, our algorithm improved the registration recall by 5%–20%. In the terra-cotta warrior dataset, the relative rotation and translation errors were less than 0.071 and 0.016, respectively, which are several times to one order of magnitude lower than those of other algorithms. The model proposed herein can extract key feature information from a 3D point cloud and is more robust to variations in point density and overlapping compared with other models.

Optics and Precision Engineering
Dec. 25, 2022, Vol. 30 Issue 24 3210 (2022)
Chained semantic generation network for video captioning
Lin MAO, Hang GAO, Dawei YANG, and Rubo ZHANG

Aiming to address the unsatisfactory expression ability of semantics, which results in inaccurate text descriptions in video captioning, a chained semantic generation network (ChainS-Net) for video captioning is proposed. A multistage two-branch crossing chained feature extraction structure is constructed that uses global and local domain modules as basic units and captures the video semantics from global and local visual features, respectively. At each stage of the network, semantic information is transformed and parsed between the global and local domains. This method allows visual and semantic information to be cross referenced and improves the semantic expression ability. Furthermore, it allows a more effective semantic representation to be obtained through multistage iterative processing, thereby improving video captioning. Experimental results on MSR-VTT and MSVD datasets show that the proposed ChainS-Net outperforms other similar algorithms. Compared with the semantics-assisted video captioning network, SAVC, ChainS-Net shows average improvements of 2.5% in four metrics of video captioning.

Optics and Precision Engineering
Dec. 25, 2022, Vol. 30 Issue 24 3198 (2022)
3D laser point cloud skeleton extraction via balance of local correlation points
Minquan ZHOU, Chunhui LI, Liqing WANG, Yuhe ZHANG, and Guohua GENG

The shape analysis and shape transformation of the laser-scanning point cloud model depend on the curve skeleton. We proposed a fast and automatic method to obtain the curve skeleton of laser-scanning point cloud to transform the shape of the model and reduce the time consumption caused by manually binding the skeleton. In this method, the initial skeleton point is defined as the midpoint of the nearest correlation point with symmetrical normal in the point cloud. The final skeleton point is obtained by iterating the initial skeleton point to a balance position. Then the principal component analysis method is used to search for the combination of skeleton points that meet the requirements of direction consistency, and the breadth-first search method is used to merge the growing different skeleton branches. Finally, each branch is smoothed and connected by the Laplace smoothing method, a complete skeleton line is obtained and the curve skeleton is used in the task of model shape transformation. The proposed method is compared with the L1-Medial Skeleton, the Mass-driven Topology-aware Curve Skeleton method and other methods, and the original scanned point cloud is used as the test data to verify the effectiveness, robustness, and efficiency. The extraction efficiency of the proposed model is improved to the level that it takes 0.764 s to process the point cloud composed of 8 077 points, and it takes 4.356 s to process a point cloud with 33 041 points. The curve skeleton of laser scanned point cloud extracted is applied to the task of shape transformation of point cloud, which shows the practicability of this method.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2962 (2022)
Rapid star pattern matching for precisely tracking telescopes
Delong LIU, Wenbo YANG, Ming LIU, Zhe KANG, and Zhenwei LI

Star pattern matching is critical when using high-precision astronomical positioning technique, which, in turn, is required for the precise determination of space objects’ orbits. This paper proposes a rapid star-pattern-matching method for precisely tracking telescopes, including its principles, program flow and implementation. First, the catalog of pointed sky region was filtered out with the shafting positioning from the encoder; then, after selection and reduction, the stars in the area were compiled into a navigation stars list. After which, the optimized triangle matching method, based on dimension reduction and table look-up methods, was employed in the first frame, where the relation between plate constants verifies the success of matching. The changes of the telescope pointing were considered to calculate the standard coordinates of navigation stars, and the observed stars were matched through the last plate constants. Finally, the coordinates of navigation stars calculated were compared by the plate constants with their reduced values from the catalog, and the position of one space object was calculated through celestial positioning. Experimental results indicated that 39 star pairs could be matched by applying the optimized triangle matching proposed, however, with only approximately 1/300 of the time taken compared to the conventional method. When calculating the following sequential frames (approximately 100 star pairs per-frame), the matching could be finished in less than 0.04 s. The matched star pairs obtained by the rapid star pattern matching algorithm were used to locate an MEO laser satellite with an average error of approximately 0.5″. As a result, the method proposed fully satisfies the requirements of high accuracy and speed in astronomical positioning for precisely tracking telescopes.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2952 (2022)
Coarse-to-fine underwater image enhancement based on multi-level wavelet transform
Guoming YUAN, Guang YANG, Jinfeng WANG, Haijun LIU, and Wei WANG

To correct the color distortion and enhance the details of degraded underwater image, this paper proposes a coarse-to-fine underwater image enhancement method based on multi-level wavelet transform. Firstly, a raw underwater image is decomposed into a low-frequency image and a series of high-frequency images based on the wavelet transform. Secondly, a two-stage underwater enhancement network is proposed, which includes a multi-level wavelet transform sub-network and a refinement sub-network with the proposed second-order Runge-Kutta block. The multi-level wavelet transform sub-network, which estimates preliminary result, contains a low-frequency and a high-frequency branch. Specifically, the low-frequency branch treats the color correction problem as the implicit style transfer problem and introduces the instance normalization and the position normalization into the branch. To ensure an accurate reconstruction, when manipulating low-frequency information, the high-frequency branch calculates the enhanced mask according to the information from both low- and high-frequency images and implements the enhancement by multiplying the progressive up-sampling enhanced mask with the high-frequency images. We implemented the inverse wavelet transform and obtain the preliminary results. Finally, the refinement network was designed to further optimize the preliminary results with the proposed second-order Runge-Kutta block. Experimental results demonstrated that the proposed method outperformed the existing methods in enhancement effect on both synthetic and real images, whilst the Peak Signal-to-Noise Ratio (PSNR) improved by 9%. The proposed method also meets the requirement of underwater vision tasks, such as color correction, details enhancement, and clarity.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2939 (2022)
No-reference video quality objective assessment method based on the content visual perception and transmission distortion
Juncai YAO, Haowei TANG, and Jing SHEN

This paper proposes a Video Quality Assessment (VQA) method based on video-content perception by analyzing the influence of video content, transmission delay, and encoding and decoding distortion characteristics on the VQA, combined with human visual system characteristics and its mathematical model. In this method, the video contents are described by the texture complexity, local contrast, temporal information of video frame image, and their visual perception. Thus, the video contents perception model can be built, which allows for investigating the influence of the video content and their visual perception on VQA. The relationship between the bit rate and video quality is discussed, whose relationship models are built, to study the impact of the bit rate of video-on-video quality. Subsequently, the VQA model that video quality degradation caused by transmission delay distortion is designed by combining the characteristics of video transmission delay. Finally, the convex optimization method is used to synthesize the above three aspects of models, and a no-reference VQA model considering the video contents, encoding and decoding distortion, transmission delay distortion, and human visual system characteristics, is proposed. The proposed VQA model was tested and verified using the videos from several established video databases and open-source video databases, and its performance was compared with that of 17 existing VQA models. The results showed that the precision Pearson linear and Spearman rank order correlation coefficients of the proposed VQA model reached a minimum of 0.8773 and 0.8336 and a maximum of 0.938 3 and 0.943 8, respectively. This shows that the model has good generalization performance and low complexity. Analyzing the overall efficiency of performance in terms of model accuracy, generalization performance, and complexity, the results show that the proposed model is an excellent VQA model.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2923 (2022)
Multi-regularization image restoration method for X-ray images of integrated circuit
Ge MA, Sen LIN, Zhifu LI, Zhijia ZHAO, and Tao ZOU

The X-ray images of Integrated Circuits (IC) generally have high noise and low contrast characteristics. Considering the different needs in detail preservation and noise removal of edge details and smooth regions, this paper proposes a multi-regularization image restoration method. Firstly, by employing a Fourier transform based Gauss high-pass filter and Gauss low-pass filter, the edge detail and smoothing filter results were obtained as new observed images for image restoration. Then, a TV-l1 mixed regularization model, which takes full use of the advantages of l1 regularization term in detail preservation and Total Variation (TV) regularization term in noise removal, was designed. The model is capable of addressing the problem of excessive smoothing or defective denoising caused by a single regularization term. Experiments on standard and IC X-ray images show that the proposed method can effectively remove noise while retaining more details, laying a foundation for subsequent defect detection of IC.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2913 (2022)
Camera pose estimation based on 2D image and 3D point cloud fusion
Jia-le ZHOU, Bing ZHU, and Zhi-lu WU

This paper presents an estimation algorithm for the six degree-of-freedom camera pose obtained from a single RGB image in a specific environment using a combination of the known image and point cloud information. Specifically, we propose a multi-stage camera pose estimation algorithm based on dense scene regression. First, the camera pose estimation dataset is composed by combining the depth image information and Structure from Motion (SFM) algorithm. Then, for the first time, we introduce depth image retrieval into the construction of two- and three-dimensional (2D-3D) matching points. Using the proposed pose optimization function, a multi-stage camera pose estimation method is proposed. The ResNet network considerably improves the pose estimation accuracy. Experimental results indicate that the pose estimation accuracy is 82.7% on average in the open dataset 7 scenes, and 94.8% in our own dataset (estimated poses falling within the threshold of 5 cm/5°). Compared with other camera pose estimation methods, our method has better pose estimation accuracy for both our and public datasets.

Optics and Precision Engineering
Nov. 25, 2022, Vol. 30 Issue 22 2901 (2022)
Overview of visual pose estimation methods for space missions
Rui ZHOU, Yanfang LIU, Naiming QI, and Jiayu SHE

With the development of artificial intelligence, target recognition and pose estimation based on computer vision have received widespread attention. At present, computer-vision-based pose estimation technology for cooperative targets is being widely used in space missions, such as in rendezvous and docking. However, for noncooperative targets, complex environments, such as stray-light backgrounds, surface-coating reflections, and dramatic light changes, cause difficulties in feature extraction and pose estimation. In this paper, the methods and applications of visual-based pose estimation in space missions are summarized. Various target recognition and pose estimation algorithms, based on deep-learning algorithms, are systematically outlined. Moreover, current deep-learning algorithms in the context of space missions are discussed. Finally, the task demand of space tasks is analyzed to present some future development trends.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2538 (2022)
Multi-target magnetic positioning with the adaptive fuzzy c-means clustering and tensor invariants
Qingzhu LI, Zhining LI, Zhiyong SHI, and Hongbo FAN

To achieve synchronous positioning of multi-target magnetic dipoles with different locations, moments, and buried depths, a multi-target positioning method based on adaptive fuzzy c-means (AFCM) clustering and tensor invariants is proposed. First, based on the 2D plane grid measurement of a magnetic gradient tensor system, the target distribution area is pre-identified by using the improved tilt angle with the invariants of normalized source strength and tensor contraction. Subsequently, the tensor-derivative invariant-relation positioning method is applied to calculate the initial coordinate points of the magnetic dipoles at grid nodes in the recognition area; these points form a dense point cloud around the real position space of the magnetic source. Finally, the AFCM clustering algorithm is employed to perform 3D clustering on these point clouds of initial position solutions and automatically detect the number of cluster centroids. The estimated number of cluster centroids is the number of targets, and the cluster centroids are the target position coordinates. Then, the tensor matrix and position vector can be used to calculate the magnetic dipole moment. Simulations show that in a Gaussian noise environment with a variance of 5 nT/m, the target-number estimation accuracy of 20 magnetic dipole targets is 100%, horizontal-position estimation accuracy is greater than 91.7%, and buried-depth estimation accuracy is greater than 85.6%. Measurements reveal that the coordinate deviation of the small magnets in the measuring areas of 2.1 m × 2.1 m and 1.2 m × 1.2 m is less than 0.091 m.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2523 (2022)
Adaptive denoising method of steel plate surface image based on BM3D
Yi YANG, Yibo LI, Zhuxi MA, Fengyu CHEN, and Qianbin HUANG

An adaptive block-matching and 3D-filtering denoising (BM3D) algorithm based on noise estimation and a threshold function is proposed to solve the problem of the distance threshold selection of the traditional BM3D algorithm not being adaptive and to improve the image quality by removing noise in steel plate images. First, the grid search method is used to obtain different plate defect images under different noise-intensity-based estimations and the final estimate for the best threshold value. Subsequently, the different function fitting effects are compared, and the estimated quadratic curve threshold function and the final estimate of four polynomial threshold functions are determined. Moreover, noise estimation is performed for the new algorithm processing phase. Finally, the new BM3D algorithm is compared with the original BM3D algorithm and other latest denoising algorithms. Experiments show that the algorithm has excellent performance in restoring the edge and detail textures of defective images. Under noise with a standard deviation of 30, the peak signal-to-noise ratio and structural similarity value of the denoising effect of each defective image are above 33 dB and 0.85, respectively. Moreover, the residual details in the residual image are reduced and are better than those achieved by applying other algorithms.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2510 (2022)
Multi-label infrared image classification algorithm based on weakly supervised learning
Chuankai MIAO, Shuli LOU, Ting LI, and Huimin CAI

Scene perception and classification of FLIR images is a key technology in target recognition and of great significance to infrared reconnaissance and guidance. To resolve the problem of scene perception and classification of FLIR images, this study proposes a multi-label infrared image classification algorithm based on weakly supervised learning. First, a multi-label image classification technique is applied to FLIR images, and the images of multiple scenes are annotated using weakly supervised techniques. Infrared image features are extracted using the ResNet-50 network with a residual structure. Second, a CSRA module is introduced to capture the different spatial regions occupied by different classes. The CSRA module can improve the feature expression performance and realize the inference calculation of topological relationships between multiple labels. Finally, the advanced loss function ASL is introduced to solve the imbalance of the number of positive and negative labels in multi-label classification. The advanced loss limits the contribution of negative samples to the loss function and focuses attention on the positive samples during training. An experiment shows that the algorithm has good adaptability and accuracy, and the accuracy can exceed 90%. The algorithm can be used to perform multi-label classification with high accuracy and adaptability.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2501 (2022)
Multi-scale dense feature fusion network for image super-resolution
Deqiang CHENG, Jiamin ZHAO, Qiqi KOU, Liangliang CHEN, and Chenggong HAN

Existing single-image super-resolution algorithms lose high-frequency details and cannot extract rich image features. Therefore, an image super-resolution reconstruction algorithm based on a multi-scale dense feature fusion network is proposed to efficiently utilize image features. This algorithm extracts image features of different scales by employing the multi-scale feature fusion residual module with convolution kernels of different scales. It fuses different scale features to better preserve the high-frequency details of images. A dense feature fusion structure is adopted between modules to fully integrate the feature information extracted from different modules, to avoid feature information loss and obtain better visual feeling. Several experiments show that the proposed method can significantly improve the peak signal-to-noise ratio and structural similarity on four benchmark datasets while reducing the number of parameters. In particular, on the Set5 dataset, compared with DID-D5, the peak signal-to-noise ratio of 4× super-resolution increases by 0.08 dB and the reconstructed image has better visual effects and richer feature information, thus confirming the effectiveness of the proposed algorithm.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2489 (2022)
Adaptive vignetting correction of corneal nerve microscopy images
Tianyu LI, Guangxu LI, Chen ZHANG, Fangting LI, and Deheng LI

The effect of a small field of view of microscopic images can be improved by stitching corneal nerve images. Owing to the vignetting effect of microscopic images, the stitched images can produce artifacts at the stitch site, affecting the diagnosis. To solve the problem of vignetting artifacts in stitched images, this study presents a method for correcting image vignetting by using nonlinear polynomial function modeling. First, a vignetting model is established for a single corneal neural image, constraints consistent with the physical properties of the vignetting are set, and the parameters of the vignetting model are iteratively optimized using the Levenberg–Marquardt optimization algorithm. During each optimization iteration, the logarithmic information entropy is calculated to determine the correction effect of the current vignetting model and prevent overcorrection of the image. At the end of the iterative optimization, the vignetting model is reversed to compensate for the original image and complete the vignetting correction process. A comparison of the stitched images before and after correction reveals that the corrected images have no obvious vignetting artifacts at the stitch site. Experiments on the images of five patient groups show that the mean values of the mean squared error, peak signal-to-noise ratio, and structural similarity evaluation indices of the corrected images reach 0.004 2, 72.225 1, and 0.960 0, respectively, with the best correction effect. The correction effect of the proposed algorithm is significantly better than that of other similar algorithms. The proposed method can effectively correct corneal image vignetting effects without cameras or environmental brightness parameters being fixed in advance. The corrected-image stitching effect is good; corneal-nerve stitching images that are more accurate and clearer with a larger field of view can be obtained.

Optics and Precision Engineering
Oct. 25, 2022, Vol. 30 Issue 20 2479 (2022)
Polarization computational imaging super-resolution reconstruction with lightweight attention cascading network
Jie WANG, Guoming XU, Jian MA, Yong WANG, and Yi LI

The new polarization computational imaging method in deep learning mode leads to higher computational complexity and memory usage as the network depth increases and results in insufficient hierarchical feature extraction. To this end, a lightweight polarization computational imaging super-resolution network with cascade attention is proposed that requires fewer parameters and a lower computational complexity while ensuring the reconstruction accuracy. First, cascade and fusion connections are used to deepen the representational capabilities of the convolution layers to effectively transfer shallow features and reduce the number of parameters. Second, a spatial attention adaptive weighting mechanism is designed to extract polarized multi-parameter spatial content features. A spatial pyramid network is then constructed to enhance the polarization feature information under multiple receptive fields. An upsampling module introduces the shallow and deep reconstruction paths and generates high-resolution polarization images by fusing the features of the two-layer paths. Finally, the network end information refines the blocks to learn finer features and enhance the reconstruction quality. Experiments show that the texture details of the reconstructed images using the proposed method are more abundant. The peak signal-to-noise ratio (PSNR) of two-times super-resolution on the full polarized image set is 45.12 dB, and the number of parameters is approximately 9% of that for a multi-scale residual network (MSRN). The proposed method effectively captures low-frequency feature information in a cascading manner while significantly reducing the number of parameters. Combined with the attention pyramid structure to explore deep features, an efficient super-resolution reconstruction is realized using a lightweight network.

Optics and Precision Engineering
Oct. 10, 2022, Vol. 30 Issue 19 2404 (2022)
Cross-scale infrared pedestrian detection based on dynamic feature optimization mechanism
Shuai HAO, Tian HE, Xu MA, Lei YANG, and Siya SUN

Multi-scale and partial occlusions in infrared pedestrian images for target detection make it difficult for traditional algorithms to achieve accurate detection. This study developed a cross-scale infrared pedestrian detection algorithm based on a dynamic feature optimization mechanism. First, to alleviate the limitation that pedestrian target features are difficult to express effectively in complex environments, which results in low target detection accuracy, a dynamic feature optimization mechanism is presented. The luminance perception module and EG-Chimp optimization model are designed to enhance the local contrast of the input image and suppress background information. Second, the CSPdarknet53 structure is utilized as the backbone feature extraction network. Accordingly, a CSFF-BiFPN feature pyramid structure and cross-scale feature fusion module are constructed to improve the detection accuracy of multi-scale and partially occluded pedestrian targets. Finally, the CIOU loss function is introduced to accelerate network convergence rate and improve detection performance to locate pedestrian targets more accurately. To verify the advantages of the proposed detection network, nine classical detection algorithms are selected as baseline methods and tested on KAIST datasets. Experimental results demonstrate that the proposed algorithm can accurately detect multi-scale and partially occluded infrared pedestrian targets in complex environments, with detection accuracies of up to 90.7 %.

Optics and Precision Engineering
Oct. 10, 2022, Vol. 30 Issue 19 2390 (2022)
Global hand pose estimation based on pixel voting
Jingang LIN, Dongnian LI, Chengjun CHEN, and Zhengxu ZHAO

Global hand pose estimation under changing gestures remains a challenging task in computer vision. To address the problem of large errors in this task, a method based on pixel voting was proposed. First, a convolutional neural network with an encoder-decoder structure was established to generate feature maps of semantic and pose information. Second, hand pixel positions and pixel-by-pixel pose voting were obtained from the feature maps using semantic segmentation and pose estimation branches, respectively. Finally, the pose voting of hand pixels was aggregated to obtain the voting result. Simultaneously, to solve the problem of scarcity of global hand pose datasets, a procedure for generating synthetic datasets of the human hand was established using the OpenSceneGraph 3D rendering engine and a 3D human hand model. This procedure could generate depth images and global pose labels of human hands under different gestures. Experimental results show that the average error of global hand pose estimation based on pixel voting is 5.036°, thus verifying that the proposed method can robustly and accurately estimate global hand poses from depth images.

Optics and Precision Engineering
Oct. 10, 2022, Vol. 30 Issue 19 2379 (2022)
Imaging algorithm of dual-parameter estimation through smoke using Gm-APD lidar
Yinbo ZHANG, Haoyang LI, Jianfeng SUN, Sining LI, Peng JIANG, Yue HOU, and Hailong ZHANG

When Geiger mode avalanche photo diode (Gm-APD) lidar is used to image targets obscured by dense smoke, the strong backscattering and absorption of laser light by the smoke severely limit the ability of traditional algorithms in extracting the target signal hidden in the smoke signal. To this end, we propose a Gm-APD lidar imaging algorithm based on dual-parameter estimation for imaging in smoke environments. First, this paper introduces a trigger model based on Gm-APD lidar and describes the principle for solving the actual received echo signal based on the detection probability. In addition, based on the collision theory of photons and smoke particles, as well as the Mie scattering theory, the physical relationship between the two parameters of the gamma model is derived. Second, a dual-parameter estimation algorithm is proposed based on the derived relationship, which considers approaches to accurately estimate μ and k. Finally, simulation and indoor experiments are conducted. The correctness of the relationship between μ and k is verified based on these simulation experiments, and the imaging ability of the proposed algorithm in the presence of smoke is verified through indoor experiments. The experimental results reveal that compared with traditional algorithms, the target recovery of the image reconstructed by the proposed algorithm increases by 73%, and the structural similarity increases by 0.228 9. Thus, this study effectively improves the target perception ability of Gm-APD lidar in smoke environments.

Optics and Precision Engineering
Oct. 10, 2022, Vol. 30 Issue 19 2370 (2022)
Lens-less imaging via score-based generative model
Chunhua WU, Hong PENG, Qiegen LIU, Wenbo WAN, and Yuhao WANG

Lens-less imaging is affected by twinning noise occurring in in-line holograms, and the reconstructed results continuously face poor reconstruction signal-to-noise ratio and low imaging resolution. This study proposes a lens-less imaging via a score-based generation model. In the training phase, the proposed model perturbs data distribution by gradually adding Gaussian noise by using a continuous stochastic differential equation (SDE). A continuous time-dependent score-based function with denoising score matching is then trained and used to solve the inverse SDE required to generate object sample data. In the testing phase, a single Fresnel zone aperture is used as a mask to achieve lens-less encoding modulation under incoherent illumination. The prediction-correction method is then used to alternate iteration steps between the numerical SDE solver and data-fidelity term to achieve lens-less imaging reconstruction. Validation results on LSUN-bedroom and LSUN-church datasets show that the proposed algorithm can effectively eliminate twin image noise, and the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the reconstruction results can reach 25.23 dB and 0.65, respectively. The PSNR values of the reconstruction results are 17.49 dB and 7.16 dB, which is higher than that of lens-less imaging algorithms based on traditional back propagation or compressed sensing, respectively. In addition, the corresponding SSIM values were 0.42 and 0.35 higher, respectively. Therefore, the reconstruction quality of the lens-less imaging is effectively improved.

Optics and Precision Engineering
Sep. 25, 2022, Vol. 30 Issue 18 2280 (2022)
Two-stage image restoration using improved atmospheric scattering model
Nannan ZHANG, Zhiwei LI, Xinjun GUO, Xinjie XIAO, and Hao RUAN

Targeting negative effects such as clarity and contrast degradation and color distortion of images acquired in hazy weather, underwater, and in nighttime environments, a two-stage image restoration method using an improved atmospheric scattering model is proposed. A global compensation coefficient is introduced into the traditional atmospheric scattering model to obtain an improved atmospheric scattering model; the two-stage image restoration method based on this model consists of two stages. First, a degraded image is fed to the improved atmospheric scattering model to obtain a coarse restored image. The grayscale world algorithm is then used to determine the albedo of this coarse restored image. Second, the albedo and output image of the first stage are fed to the improved atmospheric scattering model to obtain the final restored image. Experimental results indicate that the proposed method can avoid the problems of color distortion and dark tones in the restored images and has good applicability. The method can effectively achieve image dehazing, underwater image restoration, and night image enhancement. The proposed method achieves excellent results in both quantitative and qualitative experiments compared with state-of-the-art methods.

Optics and Precision Engineering
Sep. 25, 2022, Vol. 30 Issue 18 2267 (2022)
Infrared and visible image fusion based on multi-scale dense attention connection network
Yong CHEN, Jiaojiao ZHANG, and Zhen WANG

To solve the loss of detail information and insufficient feature extraction in the fusion results of infrared and visible light images, a deep learning network model for infrared and visible light image fusion with multi-scale densely connected attention is proposed. First, multi-scale convolution is designed to extract information of different scales in infrared and visible light images to increase the feature extraction range in the receptive field and overcome the problem of insufficient feature extraction at a single scale. Then, feature extraction is enhanced through a densely connected network, and an attention mechanism is introduced at the end of the encoding sub-network to closely connect the global context information and enhance the ability to focus on important feature information in infrared and visible light images. Finally, the fully convolutional layers that compose the decoding network are used to reconstruct the fused image. This study selects six objective evaluation indicators of image fusion, and the fusion experiments conducted on public infrared and visible light image datasets show that the proposed algorithm exhibits improved results compared with eight other methods. The structural similarity (SSIM), spatial frequency (SF) indicators increase by an average of 0.26 and 0.45 times, respectively. The fusion results of the proposed method retain clearer edge and target information with better contrast and clarity, and are superior to the compared methods in both subjective and objective evaluations.

Optics and Precision Engineering
Sep. 25, 2022, Vol. 30 Issue 18 2253 (2022)
Unsupervised representation learning for cultural relics based on local-global bidirectional reasoning
Jie LIU, Guohua GENG, Yu TIAN, Yi WANG, Yangyang LIU, and Mingquan ZHOU

Existing representation learning methods of cultural relics require numerous labels. Manual labeling is time-consuming and labor-intensive. Furthermore, supervised learning methods cannot effectively learn the internal structure information of point clouds. We propose an unsupervised representation learning network to extract the deep features of ceramic cultural relics. The approach is based on local-global bidirectional reasoning. First, we propose a multi-scale shell convolution-based hierarchical encoder to extract local features at different scales. Second, the local-to-global reasoning module is used to map the extracted local features to the global features. The differences between the two types of features are measured using metric learning for iterative learning. Third, a fold-based decoder is used to obtain better reconstruction effects from the acquired global features in a coarse-to-fine manner. A local-to-global reasoning module supervises only the local representation to be near the global one. We propose using a low-level generation task as a self-supervision signal. The global feature can capture more basic structural information about point clouds, and the bidirectional inference between local structures and global shapes at different levels was used to learn point cloud representations. Finally, the learned representations are applied in the downstream task of point cloud classification. Experiments on the Terracotta Warriors and ModelNet40 datasets show that the proposed model significantly improves in terms of classification accuracy. The classification accuracies were 93.33% and 92.02%, respectively. The algorithm improved by approximately 4.4% and 2.82% compared with the supervised algorithm PointNet. The results demonstrate that our model achieves a comparable performance and narrows the gap between unsupervised and supervised learning approaches in downstream object classification tasks.

Optics and Precision Engineering
Sep. 25, 2022, Vol. 30 Issue 18 2241 (2022)
Super-resolution reconstruction method for space target images based on dense residual block-based GAN
Haizhao JING, Jianglin SHI, Mengzhe QIU, Yong QI, and Wenxiao ZHU

To obtain the optical images of space targets with higher resolution and clarity, it is necessary to perform super-resolution reconstruction on the degraded images corrected by ground-based adaptive optics (AO) imaging telescopes. The image super-resolution reconstruction method based on deep learning has a fast operation speed and provides rich high-frequency detail information of the image; it has been widely used in natural, medical, and remote sensing images, among other applications. Aiming at the characteristics of spatial target AO images with a single background, limited resolution, motion blur, turbulent blur, and overexposure, this study proposes using a deep learning-based generative adversarial network (GAN) method to realize the super-resolution of spatial target AO images. For resolution reconstruction, a training set of spatial target AO simulation images is first constructed for neural network training, and a GAN super-resolution reconstruction method based on dense residual blocks is then proposed. By changing the traditional residual network to dense residual blocks, improving the network depth, and introducing a relative average loss function into the discriminator network, the discriminator becomes more robust, and the training of the generative adversarial network becomes more stable. Experiments show that the proposed method improves the peak-to-noise ratio (PSNR) and structural similarity index measure (SSIM) by more than 11.6% and 10.3%, respectively, compared with traditional interpolation super-resolution methods. In addition, it improves the PSNR and SSIM by 6.5% and 4.9% on average, respectively, compared with the deep learning-based blind image super-resolution method. The proposed method effectively realizes the clear reconstruction of a spatial target AO image, reduces the artifacts of the reconstructed image, enriches image details, and achieves a better reconstruction effect.

Optics and Precision Engineering
Sep. 10, 2022, Vol. 30 Issue 17 2155 (2022)
Automatic classification of retinopathy with attention ConvNeXt
Wenbo HUANG, Yuxiang HUANG, Yuan YAO, and Yang YAN

Due to the small differences in image features between classes and the relative fuzzy classification threshold of retinopathy, automatic classification algorithms are challenged by problems related to low recognition and classification accuracy. This paper proposes an automatic classification model for retinopathy based on an improved ConvNeXt network. Aiming at solving the problem of insufficient data in the data set, the horizontal flip left and right transformation method is used to expand the data, and related data sets are introduced to balance data distribution. To solve problems related to image blurring and uneven illumination in the fundus image, the Graham method was used to predict the image. The characteristics of the lesions are also highlighted. In this paper, an attention-fused ConvNeXt network was proposed to assist doctors in diagnosing retinopathy, an efficient channel attention mechanism was introduced, and an E-Block module was designed to channel interaction information while avoiding dimensionality reduction. The transfer learning method was used to train all layer parameters of the network, and the dropout method was added to avoid the overfitting problem caused by the strong learning ability of the ConvNeXt network. The results show that the sensitivity, specificity, and accuracy of the proposed model are 95.20%, 98.80%, and 95.21%, respectively. Compared with the ConvNeXt and other networks, the performance indexes of this network model for automatic classification of retinopathy.

Optics and Precision Engineering
Sep. 10, 2022, Vol. 30 Issue 17 2147 (2022)
Underwater image enhancement based on color balance and multi-scale fusion
Zhenyu HU, Qi CHEN, and Daqi ZHU

This study proposes an underwater enhancement algorithm based on color balance and multi-scale fusion to address the color deviation, detail blur, and low contrast of underwater images caused by water absorbing and scattered light. A color balance method was used to correct color. Then, the color-corrected image was converted from the RGB space to Lab space, and the L-channel was processed with the contrast limited adaptive histogram equalization method to enhance the contrast. Subsequently, the image was converted back to the RGB space. Finally, the multi-scale fusion method was used to fuse the color-corrected image with the contrast-enhanced image according to weight maps. After image enhancement, the enhancement effect of the proposed algorithm was compared with that of other algorithms in terms of visual effect and image quality evaluations. Experiments show that the proposed algorithm can remove color deviation of an underwater image, as well as improve its clarity and contrast. Compared with the original image, the entropy, UIQM, and UCIQE of the processed image increase by at least 5.2%, 1.25 times, and 30.8%, respectively, thereby proving that the proposed algorithm can effectively improve the visual quality of underwater images.

Optics and Precision Engineering
Sep. 10, 2022, Vol. 30 Issue 17 2133 (2022)
Neural architecture search algorithm based on voting scheme
Jun YANG, and Jingfa ZHANG

A neural-architecture search algorithm based on a voting scheme was proposed to address the difference between network architectures that are automatically searched by existing algorithms and those that were evaluated by the algorithm. First, to solve the problem that uniform sampling ignores the importance of each network architecture, the training losses tested on small batch training data were used as performance estimators to sample candidate networks, thus concentrating computing resources on high-performance candidate network architectures. Second, a group sparsity regularization strategy was adopted to rank all candidate operations to solve the problem of selecting candidate operations in each node. This strategy could screen suitable candidate operations and further enhance the precision of path selection in the cell structure. Finally, by integrating the differentiable architecture search, noise and sparse regularization strategies, the optimal cell structure was selected using a weighted voting scheme, and the network architecture for 3D model recognition and classification was constructed. Experimental results indicate that the classification accuracy of the constructed network for 3D models reaches 93.9% on the ModelNet40 dataset, which is higher than that of current mainstream algorithms. The proposed algorithm effectively narrows the gap between the network architecture during the search and evaluation phases, thereby resolving the problem of inefficient network training caused by uniform sampling in previous neural-architecture search methods.

Optics and Precision Engineering
Sep. 10, 2022, Vol. 30 Issue 17 2119 (2022)
Skin lesion segmentation based on high-resolution composite network
Liming LIANG, Longsong ZHOU, Jun FENG, Xiaoqi SHENG, and Jian WU

To address problems in foreign object occlusion, a lack of feature information, and the incorrect segmentation of lesion areas during skin lesion image segmentation, a skin lesion segmentation method based on a high-resolution composite network is proposed. First, we use a preprocessing operation to refine and expand the skin lesion image to reduce the impact of foreign object occlusion on the network segmentation performance. Subsequently, we use a high-resolution network and multi-scale dense module to construct the encoding part. The high-resolution network can ensure the global transmission of high-definition feature maps, and the multi-scale dense module can maximize the transmission of lesion features, reduce the loss of image feature information, and accurately locate skin lesion areas. Next, we use a reverse high-resolution network and double residual module to construct the decoding part. The double residual module can capture deep semantic information and spatial information when reconstructing decoding features and improve the segmentation accuracy of skin lesions images. Experiments are performed on the ISBI2016, ISBI2017, and ISIC2018 datasets, whereby the obtained accuracies are 96.14 %, 93.72 %, and 95.73 %, respectively; the Dice similarity coefficients are 93.16 %, 88.56 %, and 92.00 %, respectively; and the Jaccard indices are 87.01 %, 77.19 %, and 85.19 %, respectively, and the overall performance of the segmentation method is superior to existing methods. Simulation experiments reveal that the high-resolution composite network demonstrates a superior segmentation effect on skin lesions images, which opens new avenues for the diagnosis of skin diseases.

Optics and Precision Engineering
Aug. 25, 2022, Vol. 30 Issue 16 2021 (2022)
Fusion of fractal geometric features Resnet remote sensing image building segmentation
Shengjun XU, Ruoxuan ZHANG, Yuebo MENG, Guanghui LIU, and Jiuqiang HAN

In remote sensing images, roads, trees, and shadows in the background easily interfere with buildings; this usually leads to unclear segmentation boundaries. To address this issue, a Resnet network integrating fractal geometry features is proposed. Based on the coding–decoding framework and considering the Resnet network as a backbone network, the proposed algorithm introduces an atrous spatial pyramid pooling module (FD-ASPP) integrating fractal a priori in the coding stage, which can use the fractal dimension to capture the fractal features of remote sensing images and enhance the geometric feature description ability of the Resnet network. In the decoding stage, a deep separable convolution attention fusion mechanism (DSCAF) is proposed to effectively integrate high-level and low-level features to obtain richer semantic information and location details of remote sensing images. Experiments on the WHU remote sensing image dataset show that the accuracy precision rate is 0.944 8, the recall rate is 0.946 2, the F1 score is 0.945 5, and the average cross merge ratio mIoU is 0.941 5. Compared with existing remote sensing semantic segmentation algorithms for buildings, such as FCN, Segnet, Deeplab V3, U-net, SETR, and AlignSeg, the proposed method achieves better segmentation accuracy; effectively overcomes the interference of roads, trees, shadows and other factors; and obtains a clearer building boundary.

Optics and Precision Engineering
Aug. 25, 2022, Vol. 30 Issue 16 2006 (2022)
Scene recognition for 3D point clouds: a review
Wen HAO, Wenjing ZHANG, Wei LIANG, Zhaolin XIAO, and Haiyan JIN

Intelligent robots can perform several high-risk tasks such as object detection and epidemic prevention to aid human beings. Research on scene recognition has attracted considerable attention in recent years. Scene recognition aims to obtain high-level semantic features and infer the location of a scene, laying a good foundation for simultaneous localization and mapping, autonomous driving, intelligent robotics, and loop detection. With the rapid development of 3D scanning technology, obtaining point clouds of various scenes using various scanners is extremely convenient. Compared with images, the geometric features of point clouds are invariant to drastic lighting and time changes, thus making the process of localization robust. Therefore, scene recognition of point clouds is one of the most important and fundamental research topics in computer vision. This paper systematically expounds the progress and current situation of scene recognition techniques of point clouds, including traditional methods and deep learning methods. Then, several public datasets for scene recognition are introduced in detail. The recognition rates of various algorithms are summarized. Finally, we note the challenges and future research directions of the scene recognition of point clouds. This study will help researchers in related fields to better understand the research status of scene recognition of point clouds quickly and comprehensively and lay a foundation for a further improvement in the recognition accuracy.

Optics and Precision Engineering
Aug. 25, 2022, Vol. 30 Issue 16 1988 (2022)
Semi-supervised dual path network for hyperspectral image classification
Hong HUANG, Zhen ZHANG, Ling JI, and Zhengying LI

To extract the deep discrimination features from hyperspectral images, many labeled samples are often required; however, it is difficult to label samples in hyperspectral image. By using the characteristic of combining image with hyperspectral information, a semi-supervised dual path network (SSDPNet) based on deep-manifold learning was proposed. In this network, convolution and neural networks were used to extract the spatial-spectrum joint features from few labeled samples and many unlabeled samples, respectively. Then, the manifold reconstruction graph models based on supervised and unsupervised graphs were constructed to explore the manifold structure in hyperspectral images. In addition, a joint loss function based on mean square error and manifold learning was developed to jointly measure manifold boundary and spatial-spectral probability residuals to realize integrated feedback and optimize the dual path network; this results in land cover classification. The overall classification accuracies of experiments on WHU-Hi-Longkou and Heihe hyperspectral data sets reach 97.53% and 96.79% respectively, which effectively improves the ability to classify land covers.

Optics and Precision Engineering
Aug. 10, 2022, Vol. 30 Issue 15 1889 (2022)
Removing mixed noise from remote sensing images by wavelet multifractal method
Libo CHENG, Angzhen LI, Xiaoning JIA, and Zhe LI

To remove the mixed noise from remote sensing images, a wavelet multifractal denoising algorithm was developed. The algorithm mainly uses wavelet analysis for signal decomposition and multifractal to extract image features. First, image decomposition was performed by wavelet decomposition, and additive noise was preliminarily processed using the exponential decay threshold method of the wavelet semi-soft threshold. Second, using the multifractal theory, the multifractal spectrum of the noisy image was found, and an offset operator is constructed to process the additive noise twice. Then, the sparse gradient set was obtained by multiplying the direction gradient with the two-dimensional mask layer pixel by pixel, and the denoised image is reconstructed. Finally, the evaluation index value of the denoised image was calculated, and the denoising effect was evaluated according to the numerical analysis. The experimental results show that the method can effectively remove the mixed noise of remote sensing images. The maximum peak signal-to-noise ratio of the denoised images is 26.700 dB by denoising six randomly added noise images. Moreover, the highest edge preservation index is 0.449. It can meet the requirements of the visibility and detail preservation of mixed denoising of remote sensing images and provide a reliable basis for subsequent analysis.

Optics and Precision Engineering
Aug. 10, 2022, Vol. 30 Issue 15 1880 (2022)
Visible light video denoising and FPGA hardware implementation
Sixian ZHAO, Minjie WAN, Weixian QIAN, Lin ZHOU, Ajun SHAO, Qian CHEN, and Guohua GU

It is difficult to suppress the noise in static state of the existing filtering algorithm. Moreover, motion compensated filtering algorithm fails to effectively suppress noise. To solve these problems, a video denoising algorithm based on spatio-temporal filtering is proposed and implemented on the field programmable gate array (FPGA). The algorithm mainly uses Gaussian difference filtering to extract image features, and then applies spatial filtering to suppress high-frequency noise. Simultaneously, different denoising strategies are adopted for the segmented image area by feedback. Implementing hardware requires high-level synthesis tools to simplify programming, and is to make DDR3 control module to operate input and output of video stream between modules. Simulation results show that the proposed algorithm can be used for denoising. For different scenes, the peak signal-to-noise ratio can be increased by up to 15 dB in comparison with the denoising algorithm based on a non-subsampled contourlet (NSCT). After transplanting the algorithm to FPGA, the difference between PSNR and MATLAB simulation program was approximately 0.3 dB, and the running time was shortened by over 71.5%. Considering the real-time performance, PSNR achieves a better visible video denoising effect.

Optics and Precision Engineering
Aug. 10, 2022, Vol. 30 Issue 15 1868 (2022)
Lightweight pedestrian detection for multiple scenes
Yunzuo ZHANG, Wenbo LI, Wei GUO, and Zhouchen SONG

Currently, pedestrian detection in multiple scenes is a research hotspot in the field of computer vision. Deep learning has attracted considerable attention and can provide high detection accuracy; however, the subsequent high-complexity operations seriously limit its application on mobile platforms. To address this problem, this paper proposes a lightweight pedestrian detection algorithm for multiple scenes. Firstly, a deep and shallow feature fusion network is constructed to learn the texture features of multi-scale pedestrians. Secondly, a cross-dimensional feature-guided attention module is designed to retain the interactive information between channels and spaces in the process of feature extraction. Finally, the redundant channels in the model are trimmed using the pruning strategy, to reduce the algorithm complexity. In addition, an adaptive Gamma correction algorithm is designed to reduce the influence of external disturbances, such as illumination and shadows, on the detection results of multiple scenes. The experimental results show that the proposed method can compress the model volume to 10 MB, and the processing speed can reach 93 Frame/s while achieving similar detection accuracy, which outperforms the current mainstream methods.

Optics and Precision Engineering
Jul. 25, 2022, Vol. 30 Issue 14 1764 (2022)
CSA-NSGAII algorithm for magnetically shielded room shield lamination optimization
Songnan YANG, Xiaohui ZHANG, Yuanyuan LIU, Jinsheng ZHANG, and Xiaoli XI

To improve the shielding performance of a multilayer magnetically shielded structure and to further reduce the construction cost of a magnetically shielded room, this study proposes to treat the magnetically shielded structure as a multi-objective function optimization problem, and to optimize the parameters of the shielded lamination structure using the non-dominated sorting genetic algorithm-II (NSGAII) under the constraints of feasible construction cost and quality construction composition. In this study, the NSGAII algorithm is improved using a segmental crossover strategy with an adaptive variation operator called CSA-NSGAII to solve the problems of the traditional NSGAII algorithm of uneven population convergence distribution, poor global search ability, and easily falling into a local optimum. Compared with the original NSGAII algorithm, NSGAII-SDR, g-NSGAII, and MOEA/D algorithms, the CSA-NSGAII is beneficial in GD, IGD, and spacing, indicating that the proposed CSA-NSGAII algorithm achieves improved convergence performance and a more uniform population distribution. By applying the algorithm proposed in this paper to the multi-objective optimization design problem of the magnetic shielding structure, the experimental results show that the optimized stacked structure can, on average, save approximately 14% of the construction costs while achieving the same shielding performance, and can achieve approximately 70 dB of shielding performance in a Helmholtz coil with an interference amplitude of 32 000 nT and frequency of 1 Hz.

Optics and Precision Engineering
Jul. 25, 2022, Vol. 30 Issue 14 1749 (2022)
Feature point detection for optical and SAR remote sensing images registration
Lina WANG, Huaidan LIANG, Zhongshi WANG, Rui XU, and Guangfeng SHI

The influence of SAR speckle noise makes it difficult for the existing state-of-the art algorithms to guarantee the repeatability rate of feature points when extracting them from optical and SAR images owing to the nonlinear radiation differences between optical and SAR remote sensing images, which consequently reduce the matching performance. To address the above problems, a Harris feature point extraction algorithm based on phase congruency moment feature is proposed. Firstly, blocking strategy was used to divide the input image into several image blocks; secondly, phase congruency intermediate moments were defined; then, phase congruency multi-moment maps were calculated for each image block; and finally, a voting strategy was designed on the phase congruency multi-moment maps. The feature points that appeared more than half of the time on the multi-moment image were selected as the final feature points. In this study, the simulated optical and SAR images were used as experimental data, and three different feature point detection algorithms were selected for comparison with the proposed algorithm. Experimental results showed that the proposed algorithm can overcome the influence of nonlinear radiation differences between optical and SAR remote sensing images and the SAR speckle noise, improving the repeatability rate of feature points effectively. The registration results on the real optical and SAR images showed that, compared with the other three algorithms, the matching points increased by 23, 26, and 35 pairs and the root mean square error decreased by 12.6%, 37.2%, and 40.8%, respectively. The performance of registration algorithm was improved effectively.

Optics and Precision Engineering
Jul. 25, 2022, Vol. 30 Issue 14 1738 (2022)
Optimization of living cell delivery parameters in air-assisted atomization
Xintao YAN, Ce WANG, Yao WANG, Feifei SONG, and Xiaodong WU

To achieve high-efficiency delivery of bio-ink, a novel cell delivery device based on air-assisted atomization was developed,and its atomization characteristics and their effect on cell viability were studied. According to the Rosin-Rammler droplet size distribution law, the uniformity index was quantitatively evaluated. Multi-frame image superposition and local threshold binarization were used to extract the boundary of atomization from a video, so that the atomization angle could be analyzed. By combining the BM3D denoising and binarization algorithm to accurately locate the droplet positions, we could calculate the velocity of the droplets. The atomization height and the liquid flow rate are positively correlated to the diameter of the droplets. The pressure of the assistant air has a noticeable effect on the droplet uniformity when this pressure is higher than 60 kPa. By controlling the liquid flow rate and spray height, the spray area can be adjusted to a wide range of 50-1 800 mm2. The velocity of droplets was greatly affected by the spray height. At a spray height of 50 mm, the velocity reached maximum value, with an average value as high as 14 m/s. Both the spray height and the auxiliary air flow rate significantly affect the viability of HaCaT cells. The rate of cell activation was 64.01±0.86% and 90.24±0.73% at a spray height of 50 mm and 100 mm, respectively (the cell activation rate was 92.98±3.21% in the negative control group). Within 72 hours after spraying, the viability of HaCaT cells was consistent with that of unsprayed cells. The novel atomization device can be used for high-efficiency delivery of bio-ink, and the relevant research results also provide guidance for the optimal design of atomization parameters to achieve spraying of areas of different size and high cell-activation rate delivery.

Optics and Precision Engineering
Jul. 25, 2022, Vol. 30 Issue 14 1725 (2022)
Defect detection in ceramic substrate based on improved YOLOV4
Feng GUO, Qibing ZHU, Min HUANG, and Xiaoxiang XU

A ceramic substrate is an important basic material for semiconductor components. Detecting defects in it is of great significance for ensuring high product quality. An automatic defect detection method for a ceramic substrate based on the improved YOLOV4 network was proposed in this paper. To ease the difficulty associated with defect detection caused by small defect size, varying color and shape, and large size variation between different kinds of defects in a ceramic substrate, the improved YOLOV4 model optimized the design of the initial prior box by referring to the Complete Intersection over Union (CIoU) idea. The model then introduced the Confidence Loss function based on the Gradient Harmonizing Mechanism (GHM) and CRISS-Cross Attention Net (CCNet) to improve the defect detection ability. The experimental results show that the average accuracy of the detection method based on the improved YOLOV4 model for ceramic substrate defects, including stain, foreign matter, gold edge bulge, ceramic gap and damage, can reach 98.3%. This accuracy meets the industry requirements for the detection accuracy of ceramic substrate defects.

Optics and Precision Engineering
Jul. 10, 2022, Vol. 30 Issue 13 1631 (2022)
Relocation non-maximum suppression algorithm
Shuzhi SU, Runbin CHEN, Yanmin ZHU, and Bowen JIANG

Non-Maximum Suppression (NMS) is a post-processing algorithm used for object detection. It selects optimal bounding boxes from the bounding boxes set and suppresses other bounding boxes. NMS selects the bounding box with the highest score of classification confidence as the optimal bounding box. However, it ignores the correlation between localization accuracy and the classification confidence score. The classification confidence score cannot effectively represent the localization accuracy. This paper proposes a novel Relocation Non-Maximum Suppression (R-NMS) algorithm to solve the above-mentioned problem. First, the bounding box with the highest score of classification confidence in the bounding boxes set is selected as the optimal bounding box. Second, a new box distance measurement method is proposed based on R-NMS instead of using Intersection over Union (IoU) to measure the distance between the bounding boxes. Then, the location information of the bounding boxes around the optimal bounding box is obtained. Finally, the location information is used to relocate the optimal bounding box to obtain the new optimal bounding box. Compared with NMS and Soft-NMS, the mAP of R-NMS on YOLOv3 increased by 0.7 % and 0.5 %, respectively. The mAP of R-NMS on Faster-RCNN is 80.83 %, and the effectiveness of the proposed algorithm in the improvement of the mAP of various object detectors is confirmed.

Optics and Precision Engineering
Jul. 10, 2022, Vol. 30 Issue 13 1620 (2022)
Hyperspectral reconstruction from RGB images based on Res2-Unet deep learning network
Beibei SONG, Suina MA, Fan HE, and Wenfang SUN

Because of hyperspectral imaging equipment are expensive, a deep learning network to reconstruct high-quality hyperspectral images from easily obtained RGB images was proposed. The proposed network was based on the Unet framework, and its backbone network was primarily constructed using the Res2Net module, which could extract fine local and global image features. The channel attention mechanism was introduced to adaptively adjust the channel characteristic response, and the information of different scales and depths was fully integrated through a skip connection between the coding and decoding paths. Finally, it was trained and tested on the dataset provided by the new trends in the image restoration and enhancement (NTIRE) 2020 international challenge. Experiments show that compared with the adaptive weighted attention network (AWAN) and hierarchical regression network (HRNet), the proposed method obtains the best results in the four objective evaluation methods, such as the mean of relative absolute error (MRAE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and mean of spectral angle mapper (MSAM). Compared with AWAN and HRNet, the proposed method improves the mean of the PSNR by 0.08 dB and 1.73 dB, respectively, on the clean track, and 0.72 dB and 0.97 dB, respectively, on the real-world track. The proposed method reconstructs images with better subjective quality in the low-frequency flat area and the high-frequency texture area than the hyperspectral reference images.

Optics and Precision Engineering
Jul. 10, 2022, Vol. 30 Issue 13 1606 (2022)
Detection of foreign object debris on night airport runway fusion with self-attentional feature embedding
Zifen HE, Guangchen CHEN, Sen WANG, Yinhui ZHANG, and Linwei GUO

Foreign object debris (FOD) on an airport runway threaten aircraft safety during takeoff and landing, especially at night. This study introduces an intelligent vision algorithm to detect debris on airport runways at night. Considering the problems of existing models such as low detection accuracy owing to a tendency to focus on local features, a CSPTNet debris detection algorithm fused with self-attentional feature embedding is proposed. This algorithm replaces the standard BottleNeck module prevalent in conventional models with a Transformer BottleNeck module. In addition, the feature patch is flat segmented and embedded with position feature encoding to transform image representation from the pixel format to vector format. After capturing the relationship between the pixels in a high-dimensional vector space, the multi-head self-attention mechanism is employed to achieve the fusion of global and local features by obtaining feature information aggregated by different branches from the attention branch subspace. To solve the problems of blurred contour edges and difficult positioning due to the small scale of objects in datasets, we introduce the CIoU loss function to optimize predicted frame sizes and center positions. Thereby, the positioning accuracy of foreign object contours is enhanced. The experimental results show that the detection speed of this algorithm reaches 38 frames/s, which meets the requirements of real-time detection, and its average accuracy is 88.1%. Compared with the experimental results of the standard bottleneck module, the accuracy is increased by 5.7% through the Transformer BottleNeck module fusion with self-attentional feature embedding. In addition, compared with the state-of-the-art model YOLOv5, our is 5.2% more accurate. The obtained results demonstrate the effectiveness and engineering practicability of CSPTNet for FOD detection on airport runways at night.

Optics and Precision Engineering
Jul. 10, 2022, Vol. 30 Issue 13 1591 (2022)
Infrared multi-target dual-station positioning based on maximum density estimation in track direction
Juan YUE, Fanming LI, and Sili GAO

This study aims to reduce the influence of measurement errors on the positioning of a multi-target in dual-stations. By using the spatio-temporal distribution characteristics of the motion track points of a target over a short time period, an infrared motion multi-target dual-station positioning method is proposed based on the maximum track density estimation. First, single frame multi-target matching is performed based on the elevation difference along direction-finding rays of dual-stations. Then, based on the two-dimensional direction histogram, the target track direction is preliminarily estimated, following which the maximum density of the target track direction is determined based on the mean shift. Finally, the authenticity of the track point is validated based on the target track direction to reduce the influence of measurement errors on the target positioning result. The experimental results reveal that the proposed method effectively eliminates the mismatch point and reduces the error deviation point. The maximum fit error of the track is less than 0.5 m, and the average fit error is less than 0.3 m, which represent improvements on existing algorithms. For targets that exhibit both mismatched points and larger error deviations compared with those of the histogram method, the maximum fitting error of the proposed method is reduced by more than 50%, and the average fitting error is reduced by 27%. Thus, the proposed method can effectively reduce the positioning error, which has important applications in military and civilian fields, such as three-dimensional positioning, target prediction, and hooting training evaluation.

Optics and Precision Engineering
Jun. 25, 2022, Vol. 30 Issue 12 1509 (2022)
Improved CycleGAN network for underwater microscopic image color correction
Haotian WANG, Qingsheng LIU, Liang CHEN, Wangquan YE, Yuan LU, Jinjia GUO, and Ronger ZHENG

The absorption and scattering of light by marine water and suspended particles lead to the distortion of color in underwater microscopic images. This paper presents an improved cycle generative adversarial network (CycleGAN) algorithm for effectively correcting the color of microscopic images of underwater targets. The structural similarity index (SSIM) loss function, which measures the loss of color information among images, of the R, G, and B channels was added between the original underwater images and the reconstructed images. Therefore, the color of the R, G and B channels was accurately regulated. This enhanced not only the overall performance of the CycleGAN, but also the quality of images produced by the generator. Subsequently, the improved network was trained by using a training data set, which consisted of underwater multicolor self-made target images and microscopic images of natural stones. The trained network model was used to correct the color of the microscopic images of underwater stones. The results showed that the improved CycleGAN algorithm had distinct advantages in color correction over other methods. The peak signal-to-noise ratio and SSIM of the images processed by using this algorithm were 41.85% and 35.62% higher than those processed by using the traditional Retinex algorithm, respectively. Moreover, in terms of subjective vision, the corrected underwater microscopic images had the highest color similarity with the images taken in air. In conclusion, this method can effectively correct the color of underwater target images and improve the quality of underwater microscopic images. It can be applied in marine geology and marine biology.

Optics and Precision Engineering
Jun. 25, 2022, Vol. 30 Issue 12 1499 (2022)
Lightweight Mars remote sensing image super-resolution reconstruction network
Mingkun GENG, Fanlu WU, and Dong WANG

A lightweight Laplacian pyramid image super-resolution reconstruction convolution neural network based on deep Laplacian pyramid networks (LapSRNs) is proposed to accommodate the numerous parameters used in super-resolution reconstruction methods based on deep learning. First, shallow features are embedded from the input low resolution image (LR) input. Subsequently, using recursive blocks that allow parameter sharing and contain shared-source skip connections, deep features are extracted from the shallow features. Additionally, residual image (RI) containing high-frequency information is inferred. Next, the RI and input LR are upsampled via a transposed convolutional layer and added pixel by pixel to obtain a super-resolution image. The total number of parameters used in this method is only 3.98% of that used in the LapSRN for three scales, and the peak signal to noise ratio index increases by 0.031 3 and 0.116 7 dB under 4 times and 8 times super-resolutions, respectively. The proposed method reduces the number of parameters by 81.6%, 90.8%, and 88.8% under 2 times, 4 times, and 8 times resolutions, while the super-resolution effect is maintained.

Optics and Precision Engineering
Jun. 25, 2022, Vol. 30 Issue 12 1487 (2022)
Vehicle detection based on FVOIRGAN-Detection
Hao ZHANG, Jianhua YANG, and Haiyang HUA

To solve the problem of spatial information loss in point cloud processing, and extract the texture information of visible images to the maximum extent during the fusion, a vehicle detection method based on laser point cloud and visible image fusion is proposed. The point cloud processing idea of front views based on the original information is incorporated into the CrossGAN-Detection method. The point cloud is projected to the front view angle, and each dimension of the original point cloud information is sliced into feature channels, significantly improving the utilization efficiency of the point cloud information without reducing network performance. The idea of relative probability is introduced, and the relative real probability, instead of the absolute real probability, of the discriminator is used to identify the image such that the texture information extracted is fused. The experimental results show that the AP indexes of this method in the three categories of easy, medium, and difficult of KITTI dataset are 97.67%, 87.86%, and 79.03% respectively. In a scene with limited light, the AP index reaches 88.49%, which is 2.37% higher than that of the CrossGAN-Detection method. Hence, target detection performance is improved.

Optics and Precision Engineering
Jun. 25, 2022, Vol. 30 Issue 12 1478 (2022)
Virtual-real fusion with geometric consistency based on two-dimensional affine transformation
Jiawei TENG, Yan ZHAO, Aijia ZHANG, Shigang WANG, and Xiaokun WANG

With the rapid development of three-dimentional technologies, geometric and illumination consistency play an important role in the realistic rendering of virtual objects superimposed on real scenes. In order to construct the coordinate measurement system of real and virtual scenes of a stereo camera, realize the geometric and illumination consistency in real fusion, and improve the realism of virtual-real fusion, a geometric consistency virtual-real fusion method was proposed. At transformation from three to two-dimensional space was used for detailed calculation of the geometric relationship between a virtual object and real scene. Following this, a two-dimensional affine transformation was applied to process the two-dimensional images for a more accurate calculation. Finally, combined with the illumination consistency estimation, the virtual object was accurately inserted into the real scene using the differential rendering method to realize virtual-real fusion. The experimental results show that the virtual object can be fused into the real scene completely in terms of size and volume. The geometric consistency after fusion is increased by more than 15%. This verifies the effectiveness of the method and lays a foundation for applications of augmented reality.

Optics and Precision Engineering
Jun. 10, 2022, Vol. 30 Issue 11 1374 (2022)
Navigation system and strategies for electric inspecting UAV autonomously landing
Yingchun ZHONG, Wenxiang ZHANG, Bo WANG, Heer HUANG, and Huiqing HE

The electrical inspection of unmanned aerial vehicles (UAVs) during their autonomous landing at distributed airports is difficult, particularly when the landing cannot be performed accurately, safely, and reliably owing to unsatisfactory light conditions. In this study, a touch-down navigation system for inspecting UAVs was developed. A set of systematic strategies for landing UAVs at distributed airports was proposed, as well as a novel criterion for selecting the landing strategy. First, a touch-down navigation system was established, which included an inspecting UAV, as well as ultrawide band (UWB) positioning modules on the ground and at distributed airports. Second, based on an open-source flight control system, an exclusive landing flight controller was designed and embedded into the UAV, and a novel speed controller and algorithm for landing were designed. Third, an evaluating criterion for the landing strategy was proposed, and systematic landing strategies were proposed to accommodate challenging circumstances. Experimental results indicate the following: The landing error is within 0.3 m, and the reliability is 100%; the touch-down navigation system can guide the UAV to land safely and accurately when one or two UWB modules on the ground are disabled; each landing strategy presents its own advantages and can accommodate different landing circumstances. The touch-down navigation system can satisfy the requirements of UAVs for autonomous landing, and the proposed landing strategies significantly improve the landing reliability, thus providing a foundation for the construction of autonomous UAV systems intended for power grid inspection.

Optics and Precision Engineering
Jun. 10, 2022, Vol. 30 Issue 11 1362 (2022)
Image dehazing method based on adaptive bi-channel priors
Yutong JIANG, Zhonglin YANG, Mengqi ZHU, Yi ZHANG, and Lixia GUO

Image is an important source of information for modern warfare, and the quality of image decreases in foggy environment, which seriously hinders the ability of photoelectric reconnaissance and identification. In order to improve the effective utilization of images in foggy environment, an adaptive bi-channel prior image dehazing method was developed. First, based on the dark channel prior and the bright channel prior theories, the hazy images are converted from RGB to HSV color space, and the thresholds of saturation and luminance components are used to detect white or light pixels and black or dark pixels in hazy images that do not satisfy the dark and light channel priors, respectively. Then, superpixels are selected as the local area for the calculation of the dark and bright channels, and the local transmittance and atmospheric light values are estimated. Finally, adaptive bi-channel priors are developed to rectify any incorrect estimation of transmission and atmospheric light values for both white and black pixels. The transmittance map and atmospheric light map are filtered by the guided filter, and then substituted into the atmospheric scattering model to obtain a clear dehaze image. Experimental results show that the dehazed image restores the true color, the visual effect is natural and clear, and the dehazing process of the image is accurately and efficiently achieved. The dehazing process is performed on the FRIDA database, the mean square error between the dehazed image and the ground truth using the method in this paper is better than that of the existing method, which are 15% lower than that yielded by the BiCP method.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1246 (2022)
High frequency signal reconstruction based on compressive sensing and equivalent-time sampling
Ning JING, Dingyi YAO, Zhibin WANG, Minjuan ZHANG, and Rui ZHANG

A simple harmonic wave with frequency 10–100 GHz is collected by a domestic equivalence time optical sampling oscilloscope to measure and recover high-frequency signals in undersampling situations. There is a trigger sequence with a 5 ps delay resolution and 10 μs dynamic range in the oscilloscope. The trigger sequence, generated by two steps of coarse and fine delayers, is used to drive the high band-wide sampler, and the sampling value is output by an ADC with a frequency of 50 kHz. In this advancement, the high-frequency signal is sampled with an increasing 5 ps delay every 20 μs. The compress ratio is approximately 106, and the sampling rate is far below the Nyquist law. With compressive sensing theory, the measurement matrix is constructed by Fourier translation and equivalence time sampling sequence and sparsify the signal measurement process. The measurement signal is reconstructed by solving an Ll-norm minimum problem. The results demonstrate that the signal with a frequency of 100 GHz can be undersampled and reconstructed with a mean square error below 5×10-5, implying that the dynastic range of the sampling oscilloscope should be expanded.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1240 (2022)
A multivariate information aggregation method for crowd density estimation and counting
Guanghui LIU, Qinmeng WANG, Xuanrun CHEN, and Yuebo MENG

In crowd density estimation, the crowd distribution and quantity in a crowded scene are counted, which is vital to safety systems and traffic control. A multivariate information aggregation method is proposed herein to solve difficult feature extractions, difficult spatial semantic information acquisitions, and insufficient feature fusions in the crowd density estimation of high-density images. First, a multi-information extraction network is designed, where VGG-19 is used as a skeleton network to enhance the depth of feature extraction, and a multilayer semantic surveillance strategy is adopted to encode low-level features to improve the semantic representation of low-level features. Second, a multiscale contextual information aggregation network is designed based on spatial information embedded into the high-level feature space, and two lightweight spatial pyramiding structures with step-size convolution are applied to reduce the redundancy of model parameters during global multiscale context information aggregation. Finally, step convolution is performed at the end of the network to accelerate the network operation without affecting the precision. The ShanghaiTech, UCF-QNRF, and NWPU datasets are applied for a comparison experiment. The experimental results demonstrate that the MAE and MSE of Part_A of the ShanghaiTech dataset are 59.4 and 96.2, respectively, whereas those of Part_B are 7.7 and 11.9, respectively. The ultradense multiview-scene UCF-QNRF dataset indicates an MAE and MSE of 89.3 and 164.5, respectively. The high-density NWPU dataset indicates an MAE and MSE of 87.9 and 417.2, respectively. The proposed method performs better than the comparison method, as indicated by actual application results.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1228 (2022)
Single-image translation based on multi-scale dense feature fusion
Qihang LI, Long FENG, Qing YANG, Yu WANG, and Guohua GENG

To solve the problems of low image quality and poor detail features generated by the existing single image translation models, a single image translation model based on multi-scale dense feature fusion is proposed in this paper. First, in this model, the idea of multi-scale pyramid structure is used to downsample the original and target images to obtain input images of different sizes. Then, in the generator, images of different sizes are input into the dense feature module for style feature extraction, which are transferred from the original image to the target image, and the required translation image is generated through continuous game confrontation with the discriminator. Finally, dense feature modules are added in each stage of training by means of incremental growth generator training, which realizes the migration of generated images from global to local styles, and generates the required translation images. Extensive experiments have been conducted on various unsupervised images to perform image translation tasks. The experimental results demonstrate that in contrast to the existing methods, the training time of this method is shortened by 80%, and the SIFID value of the generated image is reduced by 22.18%. Therefore, the model proposed in this paper can better capture the distribution difference between the source and target domains, and improve the quality of image translation.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1217 (2022)
Image registration based on residual mixed attention and multi-resolution constraints
Mingna ZHANG, Xiaoqi LÜ, and Yu GU

Medical image registration has great significance in clinical applications such as atlas creation and time-series image comparison. Currently, in contrast to traditional methods, deep learning-based registration achieves the requirements of clinical real-time; however, the accuracy of registration still needs to be improved. Based on this observation, this paper proposes a registration model named MAMReg-Net, which combines residual mixed attention and multi-resolution constraints to realize the non-rigid registration of brain magnetic resonance imaging (MRI). By adding the residual mixed attention module, the model can obtain a large amount of local and non-local information simultaneously, and extract more effective internal structural features of the brain in the process of network training. Secondly, multi-resolution loss function is used to optimize the network to make the training more efficient and robust. The average dice score of the 12 anatomical structures in T1 brain MR images was 0.817, the average ASD score was 0.789, and the average registration time was 0.34 s. Experimental results demonstrate that the MAMReg-Net registration model can be better trained to learn the brain structure features to effectively improve the registration accuracy and meet clinical real-time requirements.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1203 (2022)
Single-view 3D object reconstruction based on NFFD and graph convolution
Yuanfeng LIAN, Shoushuang PEI, and Wei HU

To address the issue of inaccurate single-view three-dimensional (3D) object reconstruction results caused by complex topological objects and the absence of irregular surface details, a novel single-view 3D object reconstruction method combining non-uniform rational B-spline free deformation with a graph convolution neural network is proposed. First, a control points generation network, which introduces the connection weight policy, is used for the feature learning of two-dimensional views to obtain their control points topology. Subsequently, the NURBS basis function is used to establish the deformation relationship between the vertex contours of the point cloud model. Finally, to enhance the details, a convolutional network embedded with a mixed attention module is used to adjust the position of the deformed point cloud to reconstruct complex topological structures and irregular surfaces efficiently. Experiments on ShapeNet data show that the average values of the CD and EMD indices are 3.79 and 3.94, respectively, and that good reconstruction is achieved on the Pix3D real scene dataset. In contrast to existing single view point cloud 3D reconstruction methods, the proposed method offers a higher reconstruction accuracy of 3D objects and demonstrates higher robustness.

Optics and Precision Engineering
May. 25, 2022, Vol. 30 Issue 10 1189 (2022)
Low false alarm infrared target detection in airborne complex scenes
Dezhen YANG, Songlin YU, Jinjun FENG, Jiangyong LI, and Lihe WANG

When an infrared photoelectric detection system detects a target in a complex airborne scene, the spatial distribution of the ground false alarm interference source is consistent with the spatial distribution of the small dim target. Therefore, a multi-dimensional feature association detection algorithm based on moving target features was proposed herein. First, feature points were detected in complex scenes, and a frame skipping mechanism based on the relative velocity-height ratio was introduced. Candidate targets were detected by inter-frame image difference after image registration. Simultaneously, multi-dimension and multi-frame correlations based on the kernel correlation filter were used to suppress false alarms. In an airborne environment where the vehicle speed-to-height ratio is greater than 30 mrad/s and frame time is less than 10 ms, the average detection rate of this algorithm is 99.13%, and the false alarm rate is 10-5. This method was verified in various complex scenarios. In addition, it is suitable for pipeline parallel operation and meets the engineering needs.

Optics and Precision Engineering
Jan. 15, 2022, Vol. 30 Issue 1 96 (2022)
Transfer learning techniques for semantic segmentation of machine vision inspection and identification based on label-reserved Softmax algorithms
Guixiong LIU, and Jian HUANG

A convolutional neural network (CNN) model for machine vision inspection and identification can identify and measure the components, size, and other features of an object under test. Herein, a fine-tuning transfer learning technique for semantic segmentation based on a label-reserved softmax algorithm was proposed. First, the transfer learning modeling of semantic segmentation for machine vision inspection and identification was performed. Transferring more CNN model weights would reduce the initial loss of the model. Second, a fine-tuning transfer learning method based on label-reserved softmax algorithms was proposed, which could realize fine-tuning transfer learning with all model weights of slightly different detected objects. Experiments based on custom-developed datasets show that the training time for training models to satisfy the requirements of machine vision inspection and identification is reduced from 42.8 min to 30.1 min. Application experiments show that this transfer learning technique enables semi-supervised learning for the inspection of standard component installation, the inspection of missed and mis-installation cases, and the identification of assembly quality. The training time for the transfer learning of new chassis is less than 20.2 min, and the inspection accuracy reaches 100%. The fine-tuning transfer learning technique is effective and satisfies the requirements of machine vision inspection and identification.

Optics and Precision Engineering
Jan. 15, 2022, Vol. 30 Issue 1 117 (2022)
Three-dimensional reconstruction and recognition of weld defects based on magneto-optical imaging
Yukun JI, Congyi WANG, Qianwen LIU, Yanxi ZHANG, and Xiangdong GAO

Nondestructive testing of the surface and subsurface of welding defects is key for ensuring the quality of welding products. A three-dimensional (3D) reconstruction method of welding defects based on Faraday magneto-optical imaging (MOI) is investigated to realize the shape and size recognition of welding defects. First, based on the principle of MOI, the corresponding relationship between the magnetic induction intensities of the magnetic leakage field and MOI is analyzed. Subsequently, using a pulsed laser welding pit (3 mm × 0.3 mm × 0.25 mm) as the research object, a 3D finite element magnetic field simulation model of the pit is established to investigate the distribution of magnetic induction intensity of the leakage field. Moreover, a two-dimensional plane contour of welding pit defects is extracted via image digitization and the pixel value distribution of MOI. A gradient-deviation algorithm is designed to construct the depth information. Finally, the 3D profile of the welding defect is obtained. Results show that the magnetic field intensity should be greater the farther it is from the center point of the welding pit defects. Meanwhile, the closer it is to the center point of the Y-axis direction, the larger is the gradient of the field intensity change. The maximum depth of the pits is between 150 and 200 μm, and the differences in the average and median depths are 0.1 and 2 μm, respectively, which are different from those of a confocal microscope. MOI technology affords high identification accuracy and can realize the 3D contour reconstruction of welding defects.

Optics and Precision Engineering
Jan. 15, 2022, Vol. 30 Issue 1 108 (2022)
Automatic location of anatomical points in head MRI based on the scale attention hourglass network
Sai LI, Hao-jiang LI, Li-zhi LIU, Tian-qiao ZHANG, and Hong-bo CHEN

To automate the location of stable anatomical points in head magnetic resonance imaging (MRI), an automated anatomical point locating procedure using head MRI images has been proposed that relies on hourglass network (HN). In this method, the basic HN structure is used to extract and fuse multi-scale features. The scale attention mechanism is introduced in the fusion of multi-scale features to improve anatomical point location accuracy. This method uses the differential spatial to numerical transform (DSNT) layer to locate anatomical points using coordinate regression of the predicted heat map generated by the convolution neural network. Five hundred head MRI images were used for training, whereas three hundred images were used for testing. Accuracy of the proposed method for location of four anatomical points was >80%. Compared with the common methods currently used to locate key points, the proposed method achieved the best results. This method can assist doctors in marking anatomical points in images and provide technical support for automated registration of head MRI and big data analyses of head diseases.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2278 (2021)
Asymmetrically clipped optical OFDM with generalized index modulation for atmospheric turbulent channel
Hui-qin WANG, Hong-xia DOU, Ming-hua CAO, Yu-kun MA, and Qing-bin PENG

Optical orthogonal frequency division multiplexing (OFDM) with index modulation has the advantage of high spectral efficiency (SE); however, the error performance is not optimal. In this paper, a scheme of asymmetrically clipped optical OFDM with generalized index modulation (ACO-OFDM-GIM) is proposed. In this scheme, the number of active subcarriers in each sub-block can be one or more, and these subcarriers extend the frequency domain modulation. Furthermore, a subcarrier allocation algorithm is adopted to eliminate the correlation between adjacent subcarriers. In this way, a better error performance may be achieved. Taking asymmetrically clipped O-OFDM-GIM (ACO-OFDM-GIM) as an example, the modulation mapping principle of O-OFDM-GIM has been introduced in detail. In addition, the asymptotic bit error probability of the ACO-OFDM-GIM scheme for the turbulence channel is derived and the correctness is verified via simulation. Furthermore, the performance of the ACO-OFDM-GIM system is compared with that of ACO-OFDM and ACO-OFDM-IM systems. The results showed that the transmission rate and the error performance of the ACO-OFDM-GIM scheme are improved compared with the ACO-OFDM-IM and ACO-OFDM schemes. When the SE is indifferent, the error performance of the proposed scheme is greater than ACO-OFDM and ACO-OFDM-IM systems at a large signal noise ratio (SNR). When the bit error rate is 1×10-4, the SNR of (4,[1,2]) ACO-OFDM-GIM scheme outperforms that of (4,2) ACO-OFDM-IM and ACO-OFDM schemes nearly 2.5 dB and 4.5 dB, under a strong turbulence channel, respectively. Therefore, the ACO-OFDM-GIM scheme is expected to effectively improve the transmission rate of atmospheric laser communication in the future.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2268 (2021)
Sparse mixture iterative closest point registration
Yue-sheng LIU, Xin-du CHEN, Lei WU, Yun-bao HUANG, and Hai yan LI

The sparse mixture iterative closest point (SM-ICP) method is proposed for achieving accurate alignment of point-sets, while avoiding the influence of outliers. This study investigates sparse representation, non-convex optimization, and point-sets registration. First, the registered residuals are represented by mixed regularization to establish a sparse mixture formula. The alternating direction of multiplier method (ADMM) is then integrated to solve the proposed formula using a nested framework. Among the variables, the balance weight θ for mixed regularization can be calculated using a sigmoid function. The scalar version is also provided to represent the corresponding loss of function in the inner loop of ADMM. Finally, the soft threshold formula for the scalar version can be deduced in point-set registration. Experimental results indicate that the registration accuracy of the proposed SM-ICP method is better than the that of established algorithms investigated for comparison. This improved accuracy is especially striking in the registration experiment of the Stanford bunny dataset. With 50% overlap rate, the trimmed registration error of SM-ICP was 2.04×10-4. Compared with other methods, our trimmed error was one order of magnitude lower than those of the robust Trimmed-ICP (robust Tr-ICP) and ICP algorithms. Moreover, it was approximately three times lower than the error obtained using the sparse ICP (S-ICP) algorithm. In the registration experiments for both other objects and for scene data, the registration accuracy of the SM-ICP method also performed better than comparable algorithms. In the registration experiment of point-sets with different levels of random noise, the trimmed registration error of SM-ICP was 4.90×10-6~1.33×10-4. This was several times to one order of magnitude lower than those of other algorithms. In the registration experiment for the engine blade, our method successfully achieved accurate registration of point-sets, but the results produced by comparable algorithms displayed different degrees of dislocation in their point-sets registration. In summary, the proposed SM-ICP algorithm displays advantages in accuracy, robustness, and generalization for point-set registration.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2255 (2021)
3D object detection based on fusion of point cloud and image by mutual attention
Jun-ying CHEN, Tong-yao BAI, and Liang ZHAO

To use image information in assisting point cloud to improve the accuracy of 3D object detection, it is necessary to solve the problem of the adaptive alignment and fusion between the image feature space and point cloud feature space. A deep learning network based on adaptive fusion of multimodal features was proposed for 3D object detection. First, a voxelization method was used to partition point clouds into even voxels. The voxel feature was derived from the features of the point cloud included, and a 3D sparse convolution neural network was used to learn the features of the point cloud. Simultaneously, a ResNet-like neural network was used to extract the image features. Next, the image features and point cloud features were aligned adaptively by introducing the mutual attention module, and the point cloud features enhanced by the image feature were obtained. Finally, based on the derived features, Region Proposal Networks (RPN) and multitask learning networks for classification and regression tasks were applied to achieve 3D object detection. The experimental results on the KITTI 3D object detection data set showed that the average precision was 88.76%, 77.63%, and 76.14%, respectively on simple, medium, and difficult levels of car detection. The proposed method can effectively fuse image and point cloud information, and improve the precision of 3D object detection.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2247 (2021)
Discretization matching of weakly-correlated speckle images in oblique field of view
Mei-tu YE, Jin LIANG, Lei-gang LI, Mao-dong REN, and Ren-hong CHEN

To solve the problem of weak correlations between speckle images caused by oblique perspectives, a discretized matching method of digital speckle images at large oblique angles is proposed. Based on the rule that the matching of small-sized subsets is relatively little affected by tilt, large-sized seed points are first discretized into small sized seed point clusters. Clusters are then matched, and matching results for small-sized seed point clusters are then integrated to obtain initial values for large-sized seed points. Accurate matching results can be obtained for large-sized seed points after precise adjustments. Oblique images are then completely matched through a seed point diffusion matching strategy. Finally, deformations can be calculated. An image sequence rotated by degrees within a 0–42° span was generated by simulation, and the matching test was carried out over a 7–30 pixel subset radius. Based on the test above, it was determined that oblique angle and subset size are the main factors affecting image matching. Suggestions for selecting key parameters critical for successful matching of oblique images is given. The efficacy of the proposed method is verified, and its comprehensive performance is evaluated. Numerical simulations and experiment results indicate that matching accuracy of the proposed method is within ±0.03 pixel. We also demonstrate that the proposed method effectively improves the rate of success of speckle image correlations in oblique fields of view, which can satisfy the requirements for stable matching of oblique speckle images and deformation measurements below 40°.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2235 (2021)
Remote sensing image feature extraction and classification based on contrastive learning method
Xiao-dong MU, Kun BAI, Xuan-ang YOU, Yong-qing ZHU, and Xue-bing CHEN

To solve the problem of lack of labeled data in the feature extraction and classification from remote sensing images using deep learning, a simple contrastive learning method involving the use of an asymmetric predictor was proposed. First, the input image is enhanced using horizontal flipping, color jitter, and grayscale methods to obtain two related views of the same image. Subsequently, they are fed into the two branches of a Siamese network for feature extraction. Next, asymmetric predictors are used to transform the features, and the network is optimized by maximizing the similarity between them. Finally, a linear classifier is trained by fixing its parameters to complete the feature classification. When 20% of the labeled samples are used for fine-tuning in the four public remote sensing image datasets, NWPU-Resisc45, EuroSAT, UC Merced, and Siri-WHU, the classification accuracies of the experiments are 77.57%, 87.70%, 60.52%, and 65.83%, respectively. Our proposed method can effectively extract the high-level semantic features of remote sensing images without using data labels and has better performance than the ImageNet pre-trained model and the latest contrastive learning method SimSiam under the conditions of insufficient number of labeled samples.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2222 (2021)
Dense irregular text detection based on multi-dimensional convolution fusion
Yue-bo MENG, De-wang SHI, Guang-hui LIU, Sheng-jun XU, and Dan JIN

Natural-scene text-detection algorithms based on deep learning have made significant progress; however, they only apply to texts with dense and irregular layouts. Owing to its small spacing and dense distribution, it is difficult to extract features from texts and the detection remains incomplete. Meanwhile, the existing text detection methods often use the direct splicing of different dimensional features, leading to insufficient multi-scale feature fusion and the loss of semantic information. To solve these problems, a dense irregular text detection method is proposed based on multi-dimensional convolution fusion. The network follows the FPN structure and utilizes a text enhancement module (TEM). By using additional global text mapping, the network pays special attention to the text information. A channel fusion strategy (CFS) is proposed, which uses the bottom-up method to establish the high-low dimension feature information chain to generate the feature map with richer semantics and reduce the information loss. In the prediction stage, text prediction results are generated through the gradual expansion of the text kernel. Experimental results on DAST1500, ICDAR2015, and CTW1500 datasets yield F values of 81.8%, 83.8%, and 79.0% respectively. The proposed algorithm not only has better performance in dense and irregular text detection but also shows a certain level of competitiveness in the case of general natural scene texts (multi-directional, curvilinear text).

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2210 (2021)
Signal processing methods of phase sensitive optical time domain reflectometer:a review
Man-ling TIAN, Dong-hui LIU, Xiao-min CAO, and Kuang-lu YU

Phase-sensitive optical time domain reflectometry is widely used in perimeter security and other fields because of its advantages of wide monitoring range and high sensitivity. In recent years, researchers have improved optical systems to increase sensing distance and spatial resolution, thus greatly increasing the amount of data that needs to be processed. In addition, strong environmental noise and diverse types of vibrations bring challenges to the practical application of distributed vibration sensing systems. This study summarizes the signal processing methods used to improve the signal-to-noise ratio and vibration recognition rate of the system, including noise reduction algorithms, feature extraction algorithms, machine learning, and deep learning algorithms; compares the advantages and disadvantages of different algorithms; and finally outlines the possible direction of signal processing methods in this field in the future.

Optics and Precision Engineering
Sep. 15, 2021, Vol. 29 Issue 9 2189 (2021)
Spacecraft structure reconstruction with fiber bragg grating and incremental extreme learning machine
Fu-sheng ZHANG, Lei ZHANG, and Yang ZHAO

A strain detection and reconstruction system was developed to monitor deformations on spacecraft plate surfaces. To this end, a displacement field reconstruction method was proposed based on fiber Bragg grating sensors and an incremental extreme learning machine (I_ELM). A four sided fixed support flat plate device was designed for strain detection and deformation reconstruction. Each of its panels had 12 sensors evenly distributed in a 4×3 grid. The accuracy and stability were improved by completely sticking method. A model for predicting structural deformation was then designed based on I_ELM. The model could effectively predict structure deformation displacement after training. Finally, 3D restoration was realized after incorporating cubic spline interpolation. The average absolute error of the proposed approach was less than 0.05 mm and root mean square error was less than 0.005 mm. Therefore, this method can be used to monitor the deformation of spacecraft surfaces.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2964 (2021)
Efficient coding and reconstruction for optical remote sensing images
Lei XIN, Feng LI, Xiao-tian LU, Zhi-yi ZHAO, and Ji-jin ZHAO

Based on the theory of compressed sensing, a method for efficient coding and reconstruction of optical remote sensing images is proposed to reduce the pressure of data acquisition and transmission faced by large area scan cameras. First, a multi-domain perception matrix is constructed combining the spatial and compressed sensing domains. Compression is realized while sampling, and multiple compressed domain information is obtained. Then, for multi-domain compressed information, a reconstruction method based on the Huber function is proposed to rapidly reconstruct high fidelity images. The results of the optical image coding and reconstruction techniques proposed in this paper have higher structural similarity(SSIM) and PSNR compared to JPEG compression methods. Using images of the Jilin-1 satellite, single target and scene infrared images yields a PSNR reaching 40 dB and SSIM exceeding 0.8. Based on these findings, an efficient system for coding and restoring optical images is designed. The proposed system can meet the need for rapid compression and high fidelity reconstruction on the satellite.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2956 (2021)
Image caption of space science experiment based on multi-modal learning
Pei-zhuo LI, Xue WAN, and Sheng-yang LI

In order to enable scientists to quickly locate the key process of the experiment and obtain detailed experimental process information, it is necessary to automatically add descriptive content to space science experiments. Aiming at the problem of small target and small data sample of space science experiment, this paper proposes the image captioning of space science experiment based on multi-modal learning. It is mainly divided into four parts: semantic segmentation model based on improved U-Net, space science experimental vocabulary candidate based on semantic segmentation, general scene image feature vector extraction from bottom-up model and image caption based on multimodal learning. In addition, the dataset of space science experiment is constructed, including semantic masks and image caption annotations. Experimental results demonstrate that: compared with the state-of-the-art image caption model neuraltalk2, the accuracy evaluation of the proposed algorithm is improved by 0.089 for METEOR and 0.174 for SPICE. It solves the difficulty of small objectives and small data samples of space science experiment. It constructs a model of space science experiment image caption based on multi-modal learning, which meets the requirements of describing space science experiment professionally and accurately, and realizes the ability from low-level sense to deep scene understanding.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2944 (2021)
Lunar surface sampling point selection of Chang’E 5
Yan-hong ZHENG, Xiang-jin DENG, Zheng GU, Sheng-yi JIN, and Qing LI

The Chang’E 5 explorer successfully realized China’s first mission of sampling and returning from an extraterrestrial body. Multi-point lunar samples were collected using the surface sampling mode. The process of surface sampling was expounded according to the characteristic of Chang’E 5. The coverage of the monitoring cameras, which were mounted obliquely, was analyzed. Further, a three-dimensional (3D) digital reconstruction workflow adapted for uneven illumination and high similarity texture scene was developed. Considering the layout of the surface sampling manipulator and other surface devices of the lander, the reachable constraint condition of the sampling monition was structured. Moreover, the sampling area, which was visible and reachable, was discussed. Aiming at a class of large-scale samplers, a multi-sampling point selection method combining digital simulation and physical verification was proposed. Subsequently, the proposed method consisting of analysis, simulation, and verification was applied in the Chang’E 5 surface sampling task. An average precision of less than 1 cm was realized for the reconstruction of the physical terrain. Additionally, the sampling points that had circumferential safety spacing greater than 15 cm and longitudinal safety spacing greater than 2 cm were confirmed. The mission results indicated that the sampling points were correct and reliable. The method effectively supported the Chang’E 5' in its surface sampling activities.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2935 (2021)
Geometric correction of optical remote sensing satellite images captured by linear array sensors circular scanning perpendicular to the orbit
Wu XUE, Peng WANG, and Ling-yu ZHONG

The new linear array circular scanning optical remote sensing satellite has achieved both super large breadth and high resolution through circular scanning perpendicular to the orbit and along the orbit splicing. However, its special imaging mode results in serious geometric distortion of the image, which requires correction. A geometric correction method for linear array circular scanning optical remote sensing satellite images is proposed. Firstly, a strict imaging model of a linear array circular scanning image is constructed based on the scientific analysis of satellite imaging characteristics. Subsequently, the image is preliminarily corrected using image orbit, attitude parameters, and open-source DEM. Finally, several homonymous points are obtained using reference orthophoto image matching as the control points, and the image is accurately corrected using spline function fitting. To verify the effectiveness of the proposed method, experiments are performed using simulated images, and the results show that the accuracy of the proposed method can reach 1 pixel. This method can effectively solve the issue of large geometric distortion of linear array circular scanning satellite images, provide high-precision image products, and has significant application value.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2924 (2021)
Siamese network based satellite component tracking
Yun-da SUN, Xue WAN, and Sheng-yang LI

To meet the requirements for precise positioning of spacecraft components during space missions, this paper proposes a spacecraft component tracking algorithm based on a Siamese neural network. The proposed approach solves the common problem of confusing similar components. First, the spacecraft component tracking problem was modeled by training with data via the neural network; the Siamese network was designed by improving the AlexNet network. A large public dataset GOT-10k was used to train the Siamese network. Stochastic gradient descent was then used to optimize the network. Finally, to eliminate the positioning confusion occasioned by the resemblance of similar parts of the spacecraft, a tracking strategy combining motion sequence characteristics was developed to improve the tracking accuracy. The spacecraft video data published by ESA was used to test the proposed algorithm. The experimental results show that the proposed algorithm, without using spacecraft related data for training, achieves 57.2% and 73.1% of the intersection ratio of the tracking results between the cabin and solar panel, and the speed reaches 38 FPS. This demonstrates that the proposed method can meet the requirements of stable and reliable tracking of spacecraft components with high precision and strong anti-interference.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2915 (2021)
Stereo celestial positioning of space-based double satellites to space target
Ju-bo ZHAO, Ting-ting XU, Xiu-bin YANG, and Qiang YONG

Based on the principle of celestial positioning, a stereo positioning method with two observatory satellites was proposed to improve the accuracy of space based detection and positioning. First, according to the performance of the optical camera, the image characteristics of the space target on the optical sensor were analyzed, and the position of the target on the image plane was accurately extracted using the threshold centroid method. Next, based on the coordinate transformation matrix between the target and the observation sensor, the observation vector model was established via the inertial coordinate system. Then, the stereo geometric positioning model of the two observation satellites was established based on the principle of least squares. This model realizes the position transformation of the space target from the two-dimensional image to the three-dimensional space. Finally, the experimental star image capturing the target was generated using ground experiment, and error simulations were performed to verify the positioning algorithm. Simulation results indicate that the positioning precision of the proposed approach can reach 10-5 m in the absence of errors. When five types of errors, satellite orbit and attitude, camera installation, and centroid extraction, were considered, the positioning error obeyed a normal distribution with a standard deviation of 114.62 m and a mean value of 0, thereby meeting the standard requirements. In this regard, this study provides a novel method for the space based high precision detection and positioning.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2902 (2021)
Predefined-time sliding mode control for rigid spacecraft
Hua-yang SAI, Zhen-bang XU, Shuai HE, En-yang ZHANG, and Chao QIN

To minimize system uncertainty and external disturbance in attitude tracking control for rigid spacecraft, a predefined-time sliding mode controller (PTSMC) is proposed. First, the spacecraft attitude tracking system is developed with quaternion parameterization, and the predefined time sliding surface is designed using an error quaternion and error angular velocity. Then, considering the uncertainties and external disturbances of the spacecraft system, a PTSMC with a non conservative upper bound is designed, and the noise of the system is reduced using boundary layer technology. Finally, by designing the Lyapunov function, the predefined-time stability of the proposed controller and the non conservative upper bound of the system convergence are demonstrated. The simulation results show that using the proposed approach, the attitude tracking accuracy of rigid spacecraft can reach 1.5×10-6 rad, and the angular velocity tracking accuracy can reach 2×10-6 rad/s. Compared with the existing predefined time control and non singular terminal sliding mode control, the upper bound of the stabilization time of the proposed control is more non conservative and has higher tracking accuracy and robustness. The effectiveness of the control scheme is further illustrated by the attitude tracking experiment of the 3 DOF airborne platform. The angle tracking error is less than 0.1 rad, and the position tracking error is less than 0.2 m.

Optics and Precision Engineering
Dec. 15, 2021, Vol. 29 Issue 12 2891 (2021)
Wind vector measurement based on ultrasonic sensors in the mixed noise of α and Gaussian noise
Yi-ran SHI, Jin-wei QI, Si-ning QU, and Yang ZHAO

To ensure high accuracy and a wide measurement range for wind vector measurement based on ultrasonic sensors in mixed noise containing α and Gaussian noise, a novel FLOM-based dual-phase measurement method is proposed in this paper. First, the FLOM operator is used to suppress mixed noise containing α and Gaussian noise; this eliminates the shortcomings of the traditional second-order moment and high-order cumulant, which cannot be used for mixed noise containing α and Gaussian noise. Then, the time delay estimation method is transformed into the phase estimation method, and a FLOM-based dual-phase measurement method based on the orthogonality of the reference signals is proposed. This method effectively eliminates the influence of the amplitude variation on the measurement accuracy. The simulation results show that the measurement accuracy and measurement range of the proposed method are higher than those of the traditional time delay estimation method under wind speeds of 0-70 m/s. Even when the SNR is -10 dB, the RMSE of wind speed measurement is less than 1.5 m/s, and that of wind direction angle measurement is less than 2°. Practical application results show that the RMSEs of wind speed and wind direction angle measurement are 0.104 m/s and 0.54°, respectively, under strong winds. The proposed method can estimate the wind vector in mixed noise containing α and Gaussian noise more accurately than the time delay estimation method can.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2734 (2021)
Fast phase unwrapping algorithm based on region segmenting with mathematical morphology
Meng-xia LI, Bo CAO, Jia-wei LU, Kai-Hua CUI, and Qian LIU

The key step of optical interferometry is phase unwrapping, which is expected to be computationally fast, highly precise, and widely applicable. According to the feature of wrapped phase that between different order fringes there are significant edges, a fast unwrapping algorithm based on region segmenting with mathematical morphology(RSMM) is proposed. First, mathematical morphology is applied to extract the boundaries and segment regions from the phase map. Then, phase differences between adjacent regions are calculated in order to determine the phase order and elevated quantity of each region, and so are phases of the pixels on boundaries. Finally, wrapped phases in regions and boundaries are elevated individually according to the quantified elevation to obtain the unwrapped phase map. Simulations and experiments indicate that RSMM requires less than 1 second to unwrap and generate a phase map for 1 000×1 000 pixels, and this required time is less than a quarter of the computation time of conventional least-square algorithms. In addition, the phase unwrapping performance is not influenced by phase boundary, data dropout, and noise. The RSMM algorithm has the advantages of high speed, broad adaptability, and high accuracy and is promising for measurement applications with a commanding requirement for computation speed, such as dynamic interferometry, optical holography, and fringe projecting profilometry.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2724 (2021)
Precision control model of rainfall inversion based on visual sensor nodes collaboration
Xing WANG, Mei-zhen WANG, and Xue-jun LIU

Widespread video sensors record rainfall information continuously. Video-based rainfall data estimation, with high spatio-temporal resolution, has become one of the most promising methods of rainfall data collection to date. However, due to the complexity and variability of sensor devices, video scenarios, etc., the quality of rainfall data estimated can often contrast between individual visual sensors. Further processing is required to ensure the quality of rainfall inversion results. Inspired by Tobler's First Law of Geography, this study presents a precision control model (PCM) for video-based-rainfall inversion results correction. The model uses the spatio-temporal information between camera nodes, within the Visual Sensor Network, as the constraint. Rainfall events were analyzed from the dimensions of spatio-temporal consistency, situational consistency, and correlation, to achieve a high-precision representation of rainfall data. A multi-granularity filtering method was adopted for rainfall inversion using mutual verification of rainfall information among video nodes. The experimental results show that the PCM model can effectively improve rainfall inversion accuracy and stability in various rainfall scenarios. The mean value of the relative error of rainfall intensity (RI) is reduced by approximately 14.85% in light or medium rainfall scenarios, and approximately 19.90% in heavy or violent rainfall scenarios; For the standard deviation of the related error of RI, approximately 40.87% reduction for medium and light rain scenarios, and approximately 40.96% reduction for heavy rain scenarios. The results of this study confirm that the proposed PCM can provide support to produce high-quality rainfall data.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2714 (2021)
Object detection algorithm based on image and point cloud fusion with N3D_DIOU
Bao-qing GUO, and Guang-fei XIE

Object detection is the basis of autonomous driving and robot navigation. To solve the problems of insufficient information in 2D images and the large data volume, uneven density, and low detection accuracy of 3D point clouds, a new 3D object-detection network is proposed through an image and point-cloud fusion with deep learning. To reduce the calculation load, the original point cloud is first filtered with the flat interceptor corresponding to the object's frame detected in the 2D image. To address the uneven density, an improved voting model network, based on a generalized Hough transform, is proposed for multiscale feature extraction. Finally, Normal Three-Dimensional Distance Intersection over Union (N3D_DIOU), a novel loss function, is extended from the Two-Dimensional Distance Intersection over Union (2D DIOU) loss function, which improves the consistency between the generated and target frames, and also improves the object-detection accuracy of the point cloud. Experiments on the KITTI dataset show that our algorithm improves the accuracy of three-dimensional detection by 0.71%, and the aerial-view detection accuracy by 7.28%, over outstanding classical methods.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2703 (2021)
Dehazing using a decomposition-composition and recurrent refinement network based on the physical imaging model
Yan-ru FENG, and Yi-bin WANG

To explore the dehazing priors and constraints among the physical parameters during imaging under haze conditions and improve dehazing accuracy, we propose a decomposition–composition and recurrent refinement network based on the physical imaging model for image dehazing. Unlike existing dehazing methods, it contains a transmission prediction branch and a clear image prediction branch. Both branches are built based on the multi-scale pyramid encoder–decoder network with a recurrent unit that can utilize multiscale contextual features and has more complete information exchange. Considering the transmission map is related to the scene depth and haze concentration, the transmission map can be regarded as a haze concentration prior and guide the clear image prediction branch to estimate and refine the dehazing result recurrently. Similarly, the clear image that contains the scene depth information is regarded as a depth prior and guides the transmission map prediction branch to predict and refine the transmission map. Then, the predicted transmission map and clear image are further synthesized as the haze image that serves as the input of the network in each recurrent step, enabling the predicted transmission map and clear image to meet the constraints of the physical imaging model. The experimental results demonstrate that our method not only achieves a good dehazing effect on both synthetic and real images, but also outperforms existing methods in terms of quality and quantity. The average processing time for a single hazy image is 0.037 s, indicating that it has potential application value in the engineering practice of image dehazing.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2692 (2021)
Identification and compensation of friction for modular joints based on grey wolf optimizer
Jing-kai CUI, Hua-yang SAI, En-yang ZHANG, Ming-chao ZHU, and Zhen-bang XU

To identify the friction model parameters of a modular joint, an off-line identification method that compensates the joint friction is proposed. First, the structure and control system of the modular joint are presented, and the dynamic model of the joint is established. Second, the LuGre friction model is developed. The grey wolf algorithm and piecewise least-square algorithm with a pseudo random sequence are then used to identify the respective model parameters. The results of two methods are compared and analyzed, and a feed-forward compensation algorithm based on the LuGre friction model is designed and verified experimentally. The experimental results indicate that compared with the piecewise least-square method, the identification accuracy of the grey wolf algorithm improved by 19.2%; the joint velocity tracking error decreased from 0.295 (°)/s to 0.183 (°)/s when the given velocity signal was a sine wave with an amplitude of 1 (°)/s and a frequency of 10 Hz; and the velocity loop bandwidth increased from 12 Hz to 18 Hz after friction compensation. Several experiments are repeated, and the identified data exhibit a high repeatability, which verifies the suitability of the proposed method. The proposed feed-forward friction compensation algorithm can be used to improve the dynamic performance of the joint control system.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2683 (2021)
Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector
Xin CHEN, Min-jie WAN, Chao MA, Qian CHEN, and Guo-hua GU

For the detection of small remote sensing targets with complex backgrounds, an improved multi-scale feature fusion-based single shot multi-box detector (SSD) method was proposed. First, a feature map fusion mechanism was designed to fuse the shallow high-resolution feature maps and deep feature maps with rich semantic information, after which feature pyramids were built between the feature maps to enhance small target features. Subsequently, the channel attention module was introduced to overcome the background interference by constructing a weight parameter space to provide more attention to the channels that focus on the target region. Finally, the scale between the priori box and the original map was adjusted to better fit the small remote sensing target scale. Qualitative and quantitative tests based on image datasets from a remote sensing aircraft were then performed, with the results showing that the proposed method improves the detection accuracy by 4.3% when compared with the SSD method and can adapt to complex multi-scale remote sensing target detection tasks without reducing the detection rate for small targets.

Optics and Precision Engineering
Nov. 15, 2021, Vol. 29 Issue 11 2672 (2021)
Crack detection and segmentation in CT images using Hessian matrix and support vector machine
Yong-ning ZOU, Zhi-bin ZHANG, Qi LI, and Hao-song YU

Crack segmentation plays an important role in industrial CT image processing. However, interference in CT images, such as noise and artifacts, can adversely affect the accuracy and precision of crack segmentation. To improve crack segmentation precision in CT image processing, this paper analyzes the characteristics of cracks in CT images, and proposes a method for CT image crack recognition and segmentation that combines a Hessian matrix with a support vector machine. Firstly, a linear filter based on a Hessian matrix is used to extract the linear structures from a CT image and enhance the contrast of these linear structures. Moreover, to represent the texture features of these linear structure images, the method directly extracts textural feature information using a Grey Level Co-occurrence Matrix, which reflects the spatial distribution of grayscale. In addition, a crack identification classifier is trained by a Support Vector Machine (SVM), which is based on a Radial Basis Function (RBF) kernel. Furthermore, the crack identification classifier is used to locate the block area positions of cracks in CT images. Finally, the binary segmentation results for cracks are obtained by Otsu threshold segmentation. The experiments demonstrate that this proposed method can improve the anti-jamming resistance of the algorithm by shielding the non-interest region in the image, and the recognition accuracy reaches 94.5%. This algorithm has practical engineering application value as it has high recognition accuracy and high segmentation accuracy.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2517 (2021)
Co-segmentation of three-dimensional shape clusters by shape similarity
Jun Yang, and Min-min Zhang

To accurately capture the context information of three-dimensional (3D) point cloud shapes and improve the accuracy of segmentation, we propose a method for the co-segmentation of 3D shape clusters using shape similarity. First, a Farthest Point Sampling is performed on the point cloud shape to obtain the centroid point, and a random pick method is used to determine the neighborhood points to construct a spherical neighborhood. Then, the feature aggregation operator is used to encode geometric topological relationships of 3D point cloud. The associated features among the neighborhood is extracted, and a spatial similarity matrix is constructed using the centroid coordinates of each spherical neighborhood. The spatial similarity matrix sums the weighted local features of the shape extracted by the encoder network to complete the collaborative analysis of the 3D shape. Finally, a hierarchical feature extraction network is built to decode the weighted associated features and complete the shape cluster co-segmentation task. Experimental results show that the co-segmentation accuracy of our algorithm on the ShapeNet Part dataset reaches 86.0%. Compared to the k-nearest neighbor algorithm, using the random selection method within a sphere as the neighborhood point sampling strategy can increase the segmentation accuracy of the network by 1.5%. Compared to the use of shared multilayer perceptrons for feature extraction, the use of feature aggregation operators for convolution operations can increase the segmentation accuracy of the network by 5.6%. Moreover, compared to the current mainstream shape segmentation algorithms, the segmentation accuracy of the proposed algorithm is superior.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2504 (2021)
Adaptive reconstruction of 3D point cloud by sparse optimization
Xiao-wei FENG, Hai-yun HU, Rui-qing ZHUANG, and Min HE

To suppress 3D point cloud noise, a feature-preserving reconstruction method using sparse optimization is proposed, which can restore sharp features while suppressing noise. First, the curvature of the underlying manifold surface is estimated using the eigenvalues of the local tensor matrix, which is constructed by using the neighboring points. To avoid the influence of outliers on normal estimation, pair consistency voting is used to realize robust statistical identification of feature points in the neighborhood. In the L0 minimization framework, an adaptive differential operator, based on feature identification, is introduced to avoid generation of artifacts in the alternating optimization process, and the projection regularization term is used to alleviate curved surface degradation. According to the optimized normal field, the sharp features are restored by projection optimization. The experimental results show that the reconstructed point cloud error is reduced by 10.2% on average, and the normal error is reduced by 29.7% on average. In addition, the subjective visual effect is better than several state-of-the-art algorithms. The introduced method can effectively improve the point cloud quality and provide technical support for 3D measurement and reverse modeling based on the point cloud.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2495 (2021)
Fine restoration of incomplete image with external features and image features
Tao XU, Ji-yong ZHOU, Guo-liang ZHANG, and Lei CAI

When large areas of an image are missing owing to unspecified factors, existing image restoration models usually cannot repair the image effectively, leading to repair results that suffer from discontinuity in their characteristics. This study proposes a fine restoration method for incomplete images with external and image features. First, we improved the dynamic memory networks (DMN+) in our study. DMN+ scheme combines the in-field features of an incomplete image and the related off-field features, generating an optimized image of the defective image containing external and image features. Next, a generative adversarial generative network with piecewise gradient penalty constraints is constructed. The network instructs the generator to perform a coarse repair on the optimized mutilated image, which results in a coarse repair image of the target to be repaired. Finally, the coarse restoration map is further optimized based on the idea of coherence of related features, and a final fine restoration image is obtained. The algorithm proposed here is verified on three image data sets with varying complexities. Moreover, the visual effects and objective data results of the proposed algorithm are compared with those of the existing dominant restoration model. The restoration results of our model are more structurally sound in terms of texture. Furthermore, our model is superior to other models in terms of both visual effects and objective data. The peak signal-to-noise ratio in the most challenging Underwater Targe dataset is 27.01, with a structural similarity index of 0.949.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2481 (2021)
River underwater 3D vision measurement method
Li XU, Yong-hao ZHOU, Gui-ming LU, Jin-feng ZHOU, Fan ZHANG, and Wen-yu LUO

Aiming to remedy the degradation of underwater images owing to turbidity of mud and sand in the natural river water, and the change of light caused by underwater refraction, an underwater vision measurement method based on radial multi-line structured light is proposed in this study. First, an underwater visual measurement model based on refraction is constructed. When the projector axis is perpendicular to the refracted plane and the light-plane passes through the projected axis, landed light plane is coplanar with underwater ones, which can avoid the underwater calibration processes. The Monte Carlo simulation method is used to analyze the influence of the imaging point errors on the underwater 3D vision measurement model. Next, an interlaced central rotation radial multi-line light mode is designed. Black and white stripes are used as the projection modes to reduce the dependence on the image quality. Measurement stripes are increased, and measurement resolution is improved through central rotations. Finally, 3D topography measurement experiments on the surface of underwater objects are made, in which the influence of turbid water on the edge extraction accuracy, coded light decoding, and 3D vision measurement are analyzed. In the experiment, sediments with different weights are added in a water volume of 1 m×1.2 m×0.8 m, which simulates the natural water environment. The experiment shows that at a measurement distance of 1500 mm, the surface of a bottle is measured in clean water, with a plane residual error of 0.95 mm; the plane residual errors are respectively 1.93 mm, 5.43 mm, 21.43 mm in muddy water with 40 g, 60 g, 90 g of sediment. When the amount of sediment exceeds a certain value (such as 60 g sediment in this experiment), the accuracy of fringe extraction deteriorates sharply. The residual error of measurement plane fitting in muddy water with 40 g and 90 g of sediment increases from 1.93 mm to 21.43 mm.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2465 (2021)
Super-resolution reconstruction of micro-scanning images
Hao-guang ZHAO, Han-shi QU, Xin WANG, Yang SHANG, Li-gang LIU, Song-wei HAN, Sen MENG, and Ping WANG

To improve the target recognition distance of the airborne electro-optical reconnaissance equipment of a UAV, this study has developed a high-speed micro-scanning super-resolution core component based on an actual engineering project. The real-time super-resolution reconstruction algorithm is implemented in the embedded platform GPU-TX2i. First, the micro-scanning super-resolution core component moves according to the preset step size and frequency to obtain a continuous image sequence with sub-pixel deviation. Then, the image super-resolution reconstruction algorithm is used based on probability distribution to process the acquired four continuous images into higher resolution images. The experimental results show that the image sequence output achieved by the detector with a frame rate of 120 fps and a resolution of 640×512 is reconstructed via super-resolution and becomes an image sequence with a frame rate of 30 fps and a resolution of 1 280×1024. After super-resolution reconstruction, the effective spatial resolution of the image is increased by 78.2% and target recognition distance is increased by 43.3%. The reconstruction time of a high-resolution image is approximately 33 ms. Furthermore, the micro-scanning super-resolution core component's micro-scan response time is < 1.0 ms and the accuracy in place is < 0.3 μm (corresponding to approximately 0.03 pixels). These results meet the real-time and precision requirements of airborne electro-optical reconnaissance equipment.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2456 (2021)
Intelligent evaluation of grotto surface weathering based on spectral chromatic aberration and principal component feature fusion
Chi-peng CAO, Hui-qin WANG, Ke WANG, Zhan WANG, Gang ZHANG, and Tao MA

To overcome the problem of a single spectral feature not finely characterizing the type and degree of weathering in the complex weathering area on a grotto surface, this paper proposes an intelligent quantitative evaluation method for grotto surface weathering based on spectral analysis and colorimetric theory. First, we reconstruct the reflection spectrum of the multispectral image of the grotto surface, calculate the color difference between each pixel and reference point, and use the principal component analysis method to extract the principal component features from the multispectral image data. Then, we fuse the spectral color difference and principal component features of the weathering on the grotto surface to characterize different types and degrees of weathering. Finally, a random forest classifier is used to intelligently evaluate the weathering degree of each pixel in the grotto surface multispectral image. Experiments show that the method of fusing spectral chromatic aberration and principal component features performs better compared with a single spectral feature in characterizing different types and degrees of weathering in complex weathering areas. The evaluation accuracy of the overall grotto surface weathering degree is 99.86%, with a Kappa coefficient of 0.99. The proposed method can effectively realize a refined characterization of complex weathered areas.

Optics and Precision Engineering
Oct. 15, 2021, Vol. 29 Issue 10 2444 (2021)
Please enter the answer below before you can view the full text.
9+1=
Submit